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Preface 


The writing of this book started in 2018 as a small compendium written for 
the course “Multiple Antenna Communications” at Linköping University. The 
initial goal was to cover a few crucial aspects not included in the course book 
Fundamentals of Massive MIMO. The principle in the writing was to explain 
the fundamentals of the topic with as simple mathematics as possible while 
including all the practical insights we gathered as researchers in the field. For 
each year that passed, the compendium became 50 pages longer. We added a 
recap of the theoretical foundations that the topic builds on, practical aspects 
often overlooked by academia (e.g., polarization), and additional concepts 
needed in a prolonged version of the course given to doctoral students. During 
the COVID-19 pandemic, lecture recordings from the course were uploaded 
to YouTube, receiving thousands of views and many positive reviews. Hence, 
when we both moved to the KTH Royal Institute of Technology in 2021- 
2022 and stopped teaching the original course, we did not want to bury the 
compendium in a digital folder. Instead, we decided to turn it into a complete 
textbook that can be shared with an international audience. 

As the original course’s syllabus no longer limited us, we could focus on 
writing the definitive introductory book on multiple-input multiple-output 
(MIMO) communications. A key motivation for us is that with the advent 
of fifth-generation (5G) mobile networks, MIMO technology is everywhere: 
each base station and mobile phone is equipped with antenna arrays capable 
of transmitting/receiving signals with controllable directivity. This feature 
leads to stronger signals, robustness against channel fading, and spatial multi- 
plexing that can drastically raise data rates. This is only the beginning of the 
MIMO saga because larger antenna arrays and higher frequency bands that 
can accommodate more antennas in the same enclosure are envisioned for 
future network generations. The MIMO technology affects the physical-layer 
transmissions and changes how resource allocation and network optimization 
are done. The same methodology also underpins emerging technologies such 
as reconfigurable intelligent surfaces (RIS) and integrated sensing and commu- 
nication (ISAC). Hence, we believe that anyone who will research or develop 
future wireless communication systems must understand the fundamentals of 
multiple antenna communications. The first textbooks on the topic were writ- 
ten 25 years ago, and the basic theory remains valid; yet many recent insights 
and methodologies are not covered in classic textbooks, new terminologies 
and hardware architectures have arisen, and some old concepts are outdated. 
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This incentivized us to spend two years finalizing this textbook, including 
adding new chapters and numerous examples, exercises, and simulations that 
can be reproduced using MATLAB code available on the book’s website. 


How to Use This Book 


This book is primarily written as the course material for a first-year graduate- 
level course and builds on undergraduate courses on signals and systems, 
linear algebra, probability theory, and digital communications. We believe the 
book should also appeal to wireless engineers and researchers who want to 
broaden their knowledge base and learn specific methods and algorithms. 

Chapter 1 provides a high-level introduction and motivation to multiple 
antenna communications. To ensure that the reader remembers the essential 
results from the mentioned undergraduate courses, Chapter 2 summarizes the 
theoretical foundations used in later chapters. The basics of point-to-point 
MIMO communications between two transceivers equipped with multiple 
antennas are provided in Chapter 3. The theory is then expanded for static 
line-of-sight (LOS) channels in Chapter 4 and random non-LOS channels in 
Chapter 5. Next, we consider multi-user MIMO channels in Chapter 6, where 
a base station with multiple antennas serves multiple user devices. These 
chapters constitute the core of the book and should be included when it is 
used for teaching a course. If these chapters are too extensive, one can omit 
Section 4.5 on planar antenna arrays, Section 4.6 on polarization, Section 5.5 
on block-fading channels, and Section 5.6 on sparse multipath propagation. 

The last three chapters are mostly independent and cover three different 
topics. Chapter 7 extends the theory to wideband MIMO channels with or- 
thogonal frequency-division multiplexing (OFDM). The chapter also describes 
hybrid analog-digital implementation architectures and MIMO terminology 
that one might encounter elsewhere. Chapter 8 covers the basics of direction- 
of-arrival estimation, localization, and radar sensing using antenna arrays. 
We explain how these array signal processing topics connect to the MIMO 
communication theory from previous chapters. The book ends with Chapter 
9, which covers reconfigurable surfaces consisting of multiple antenna-like 
elements that can reflect signals in desirable ways to enhance communication 
channels. The basic theory borrows much from that described in previous 
chapters but comes with its characteristics and constraints. 

We recommend solving exercises while reading the book. The answers are 
available online, and a solution manual is provided to instructors who use the 
book in their teaching—contact us to retrieve it. 

This is an introductory book, so there are more advanced methodologies and 
applications to learn. If you want to dig deeper into the topic, we recommend 
the textbooks Massive MIMO Networks [1], Foundations of User-Centric 
Cell-Free Massive MIMO [2], and Fundamentals of Massive MIMO [3]. 
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Chapter 1 


Introduction and Motivation 


The basic scenario in wireless communications is that of a transmit antenna 
that radiates an electromagnetic waveform that spreads out and eventually is 
measured by a receive antenna located at another geographical location. The 
transmitted waveform is designed to carry information that can be extracted 
by the receiver from its measured received signal. A combination of digital 
modulation and channel coding is used to generate the waveform and encode 
information into it, which is done in such a way that the receiver can extract 
it even if the signal is attenuated and distorted. 

There are many wireless technologies currently in use, such as the IEEE 
802.11 technology family for WiFi, the IEEE 802.15.1 family for Bluetooth, 
the 3GPP family with GSM/UMTS/LTE/NR for cellular (mobile) communi- 
cations [4], [5], and the competing but somewhat outdated 3GPP2 family with 
IS-95/CDMA2000/EV-DO. These technologies are based on open standards, 
created in collaboration between companies that jointly decide on the basic 
features but compete in building and selling commercial implementations. 
Some standards are designed to replace previous standards, targeting the 
same use cases. Other standards are optimized for different use cases—for 
example, long-range versus short-range communications, high data rate versus 
low power, or operation in licensed versus unlicensed frequency bands. 

This chapter first introduces the fundamental concepts of signal power, 
channel gain, and antenna directivity. Then the use of multiple antennas will 
be motivated by outlining three main benefits this technology can provide. 


1.1 Transmitted and Received Signal Power 


In the technologies mentioned above, the transmit power P varies substantially 
with the type of device, signal bandwidth, technology, and use case. The 
cellular base stations deployed on rooftops and towers might transmit tens 
of watts; for example, 40 W per 10 MHz of bandwidth is typical in 4G LTE 
systems [6]. Base stations deployed closer to the potential users might only 
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transmit a few hundred milliwatts; for example, 0.1 W is typical for WiFi 
access points, and 0.4 W is a limit for local-area cellular base stations in 5G 
NR [7]. A cell (mobile) phone typically radiates up to 0.1 W, and a short-range 
Bluetooth transmitter might operate at only 1 mW = 0.001 W. The power 
of a transmitter connected to an electrical grid is often limited by national 
regulations, selected to enable coexistence between different wireless systems 
and limit human exposure to strong electromagnetic fields. There are also 
regulations on battery-powered user devices; however, the devices are also 
subject to more practical limitations, such as keeping the power down to 
alleviate the need for active cooling and make the battery last longer. While 
the numbers mentioned above are the maximum power, battery-powered 
devices can purposely reduce their power during transmission and turn the 
transceivers on/off with time to save energy, especially when the data rate 
the system supports is higher than the device requires for the moment. 

Due to the large transmit power variations, a decibel scale is often used to 
report the power numbers conveniently. In particular, the unit dBm is used 
to report the ratio between the signal power and 1 mW in decibels (dB): 

Signal power 


10 logio ( mw 


where logj9(-) is the base-10 logarithm. This means that 1 mW is equal to 
0dBm, 0.1 W is 20 dBm, and 40 W is 46 dBm. We note that 10 log;,(2) ¥ 3, 
10 log;9(4) ~ 6, and 10 log,,(8) ~ 9. These approximations are often treated 
as being exact in the communication literature. Hence, doubling the signal 
power equals a 3dB increase. 


) dBm, (1.1) 


Example 1.1. The decibel scale is generally used to measure the relative size 
of two power values. Compare P) = 8 W and P = 1 W using the dBm unit. 

A direct computation based on (1.1) yields P, ~ 39dBm and P> = 30 dBm 
because 10log;9(8/10~*) ~ 39 and 10 log, (1/107?) = 30. The ratio P; /P2 is 
equal to 8, which can be expressed in decibels as 


P 8 
10 logy (=) = 10 logy, @ x 9dB. (1.2) 


This ratio can also be computed as P, [dBm] — P> [dBm] ~ 39 — 30 = 9 dB, 
by first converting both numbers to dBm and then computing their difference. 
Note that the difference between 39dBm and 30dBm is expressed in dB, 
although their individual units are dBm. While dBm measures an absolute 
power value compared to 1mW, dB is used to measure the relative ratio 
between two specific power values. In this example, we can say that P, is 
9dB larger than P», or that P; is 8 times larger than Pp. 


A transmit antenna radiates an electromagnetic signal waveform that 
travels in all directions at the speed of light. The signal power is quickly 
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Figure 1.1: An isotropic transmit antenna radiates a signal that spreads like an inflatable 
sphere. At a propagation distance d in free space, the surface area of a sphere with radius d 
is 4rd?. This area is typically huge compared to the area Aiso of an isotropic receive antenna; 
thus, the receiver only captures a tiny fraction of the signal. 


dispersed over the surrounding environment; thus, the power measured by a 
receiving device is incredibly much smaller than the transmit power. One can 
picture this as if the signal power exists on the surface of a balloon. As we 
blow up the balloon, the radius of the balloon grows, and the surface area 
becomes larger and larger, but the surface material also becomes thinner and 
thinner. When the signal waveform has traveled a distance d in free space, 
the signal power exists on a sphere with radius d, as illustrated in Figure 1.1. 
The surface area is 47d?. If the power is equally distributed over the sphere’s 
surface, the transmit antenna is said to be isotropic. This is also called a point 
source. Isotropic antennas are impossible to build! but are used for theoretical 
analysis and as a benchmark for other antennas by measuring how close to 
isotropic a practical antenna is radiating its signals. 

An elementary kind of signal waveform is the sinusoid illustrated in Fig- 
ure 1.2. This is an oscillating periodic function of time with a frequency 
denoted by f in this figure. The frequency represents the number of repeated 
periods per second observed at a specific location and is measured in Hertz 
(Hz). The period can be measured between two adjacent peaks observed in 
time and is 1/f seconds. When a sinusoidal electromagnetic wave propagates 
at the speed of light cm/s, at any given time instance, each period will cover 
a spatial interval of length c/f meters. This quantity is very important when 


1The radiated field from an antenna must satisfy the Helmholtz wave equation, which 
originates from Maxwell’s equations. One can prove that an isotropic field does not do that. 
Even if one could build an isotropic antenna, it is not practically useful since it must be connected 
to transceiver hardware that generates wireless signals. This connection would block the wave 
propagation in some directions. 
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Figure 1.2: A sinusoid is a signal waveform characterized by its amplitude and frequency f 
[Hz]. The time period between two peaks is 1/f seconds. 


analyzing how the wave interacts with objects in the surroundings, including 
antennas. It is called the signal’s wavelength and will be denoted as \ = c/f. 
The speed of light is 299 792 458 m/s in free space (vacuum), but we will use 
the close approximation c = 3-108 m/s throughout this book to enable a simple 
conversion between frequencies and wavelengths; for example, f = 3 GHz 
gives A = 0.1m. 

The receive antenna converts the impinging electromagnetic waves into 
an electric current and can thereby be used to collect signal power. The 
power-capturing ability of an antenna is quantified by its effective area. It is 
defined as the ratio of the power that the antenna can collect (in W) to the 
power flux density of the incident wave (in W/m?) [8]. It can be proved that 
a hypothetical lossless isotropic antenna must have the effective area 


A2 
An’ 
where A is the wavelength of the type of waveform the antenna was built for. 
Since \ = c/f, the effective area in (1.3) can be equivalently expressed as 


Aiso = (1.3) 


C2 


Aiso = G73" (1.4) 


This means that the higher the signal’s frequency, the smaller the area of 
the matching isotropic antenna. The word “effective” in the term “effective 
area” refers to the following: Suppose a planar waveform travels in a given 
direction, and you place a surface perpendicular to that direction to block a 
part of the signal. The antenna captures power proportional to what would 
pass through the surface if it has the specified effective area. This does not 
mean a practical antenna must have that specific area, but it depends on the 
hardware implementation and deployment.” For example, if the antenna is not 


2The effective area of an aperture-type antenna is always less than or equal to its physical 
area. The aperture efficiency, which is the ratio of the maximum effective area (over all directions) 
to the physical area of an antenna, is an essential metric in antenna design [8]. 
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Figure 1.3: The effective area of a receive antenna is generally smaller than the antenna’s 
physical area. The physical area is A in this figure. However, the effective area Acos(y) per- 
pendicular to the direction that the signal propagates determines the received signal power. 
Any non-isotropic antenna has a varying effective area for different angular directions y. The 
maximum effective area among all rotations is used as the reference value when comparing 
practical antennas of different kinds. 


perpendicular to the direction in which the wave travels, the effective area is 
smaller than the physical area of the antenna. This is illustrated in Figure 1.3, 
where the receive antenna has the physical area A. Since the antenna is not 
deployed perpendicularly to the direction that the signal is traveling, the 
effective area is the projection of the physical antenna area in that direction. 
In the figure, the antenna is rotated by an angle » € [—1/2, 7/2]; thus, the 
effective area is Acos(y), which is smaller or equal to the physical area. 


Example 1.2. Consider a lossless isotropic antenna designed for the wavelength 
à = 0.1m (f = 3 GHz). What is the power captured by this antenna if the 
power flux density of the incident electromagnetic wave is 50 yW /m?? 

The answer is the product of the effective area and the power flux density: 


2 
Ano 50 - 105° = ro 1078 ~ 3.98 - 1078 W. (1.5) 
T 


Suppose a so-called short dipole replaces the isotropic antenna. This 
non-isotropic antenna captures different amounts of power depending on its 
rotation with respect to the incident wave. The maximum effective area among 
all rotations is used as the reference value when analyzing such an antenna. 
If we measure the received power over different rotations and notice that 
5.96 - 1078 W is the maximum value, what is the maximum effective area? 

The effective area Aeg is the ratio of the captured power to the power flux 
density. In this case, it becomes 

5.96 - 1078 


= ee a m’, (1.6) 


which is approximately 1.5 times larger than Aigo. 


eff 


The black area in Figure 1.1 represents an isotropic receive antenna placed 
on the surface area of the sphere; that is, perpendicular to the direction that 
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the transmitted waveform is traveling outwards from the origin. If the receive 
antenna is located at the distance d from the transmitter, its area Aiso in 
(1.3) should be compared with the total surface area Agphere(d) = 4d? of a 
sphere with radius d. If Asphere(d) > Aiso, the fraction of the transmit power 
that reaches the receive antenna is 
Ai 2% Xi 
Asphere(d) 4rd? (4r)? d? nt) 
The factor \?/(47)? is determined only by the wavelength, while the second 
factor is inversely proportional to the square of the propagation distance. This 
means that the signal power captured by the receive antenna decays rapidly 
with the distance d. Note that this example assumes so-called free-space 
propagation, which means there are no objects inside (or outside) the sphere 
in Figure 1.1 that interact with the radiated waveform to increase or decrease 
the received power. We will use this as the basic scenario in this book but 
also cover some other scenarios. The expression in (1.7) is a special case of 
the classical Friis’ transmission formula for free-space propagation [9], which 
also applies to other types of antennas than isotropic. 

The ratio in (1.7) is called the channel gain, while its inverse is called the 
pathloss.? In this book, we often let the parameter 3 denote the channel gain. 
This is a dimensionless parameter computed as the ratio between two areas. 
To get a sense of the typical size of the channel gain, Figure 1.4 shows its 
value as a function of the distance d for three different frequencies that are 
relevant for wireless communications: 


e f =1GHz with wavelength \ = 0.3m; 


e f =3GHz with wavelength A = 0.1m; 
e f = 30 GHz with wavelength à = 0.01 m. 


Since the channel gains are generally tiny, they are presented in the decibel 
scale in Figure 1.4; that is, the vertical axis presents 
2 2 


à 1 À 


The curves start at a 1m distance, where the channel gain is —42 dB at the 
3 GHz frequency. When increasing the distance by a factor of 10, from 1m to 
10m, the channel gain reduces by 20 dB to —62dB. Hence, if we divide the 
transmit power into (roughly) one million parts, only one reaches the receive 
antenna. As seen from the last term in (1.8), the channel gain reduces by 
20 dB every time the distance increases by 10 times. Hence, another 20 dB is 
lost when the distance increases from 10m to 100m. 


3It also happens that (1.7) is called the pathloss in the communication literature, so it is 
vital to know the dimensionality of this type of term to understand which definition is used in a 
particular text. Importantly, a wireless channel can only attenuate signals, so the channel gain 
must be smaller than or equal to 1, while its inverse must be greater than or equal to 1. 
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Figure 1.4: The channel gain in (1.7) depends on the propagation distance d and the frequency 
f of the waveform, assuming that different matching isotropic antennas are used when commu- 
nicating at each of the considered frequencies. The channel gain is reported using the decibel 
scale since the variations are huge. 


Compared to communications at the 3 GHz frequency, the channel gain in 
Figure 1.4 is larger when using the lower frequency 1 GHz and smaller when 
using the higher frequency 30 GHz. This is purely due to the differences in 
the effective area in (1.3) for the corresponding isotropic receive antennas, 
which is proportional to A”. The waveforms are attenuated identically when 
propagating in free space irrespective of the frequency; that is, the power 
flux density is constant at the receiver location but is multiplied by different 
effective areas depending on the frequency band. In particular, it is only 
the first term in (1.8) that depends on the wavelength, while the distance- 
dependent second term is the same for any wavelength. 


Example 1.3. The channel gain with an isotropic receive antenna at f = 3 GHz 
and the distance d = 10m is —62dB, as shown in Figure 1.4. What is the 
corresponding channel gain if we replace the isotropic receive antenna with 
another antenna whose effective area is twice as large? What is the channel 
gain with this new antenna at a 100m distance? 

The channel gain is proportional to the effective area, as can be seen from 
(1.7) where the effective area of an isotropic antenna is divided by the area of 
a sphere. If we double the effective area, the channel gain is doubled, and in 
the decibel scale, it becomes —62 + 3 = —59dB at the 10 m distance. 

For the considered channel gain model in (1.8), there is a 20dB gain 
reduction each time the distance increases by 10 times. Hence, the channel 
gain with the new antenna at a 100m distance is —59 — 20 = —79 dB. 
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Low-band Mid-band High-band 
below 1 GHz 1-7 GHz above 7 GHz 


Figure 1.5: The radio frequency spectrum ranges from 3 kHz to 3000 GHz (i.e., 3 THz) and is 
used for many different services. The spectrum used for wireless communications is commonly 
divided into the low-band, mid-band, and high-band, as indicated in this figure. The high-band 
range 24-300 GHz is referred to as the mmWave band since the wavelength ranges from 12 to 
1mm. The range 300-3000 GHz is called the THz band. 


The three exemplified frequencies were selected to represent the three 
specific bands considered in 5G NR [10]. Most wireless communication systems 
operate in the part of the electromagnetic frequency spectrum called the radio 
spectrum, even if there are exceptions. The radio spectrum ranges have 
changed with time as the applications and hardware have evolved. According 
to the 2020 regulations from the International Telecommunication Union (ITU) 
[11], the radio spectrum consists of all frequencies from 3 kHz to 3000 GHz. 
In the context of 5G NR, the spectrum is further divided into the low-band 
containing carrier frequencies up to 1 GHz, the mid-band in the range 1-7 GHz, 
and the high-band with frequencies above 7 GHz, as illustrated in Figure 1.5.° 
The millimeter-wave (mmWave) band is a particularly prominent part of 
the high-band spectrum and, strictly speaking, covers 30-300 GHz, where 
the wavelength is between 10 and 1mm. For practical reasons, the mmWave 
band is typically said to start at 24 GHz since spectrum is available from that 
frequency in some countries. Moreover, only mmWave bands below 100 GHz 
are considered in 5G NR; thus, the range 100-300 GHz is often called the 
sub-THz band by researchers who want to differentiate future technologies 
from existing 5G solutions [13]. Finally, the range 300-3000 GHz is called the 
THz band since this range can also be expressed as 0.3-3 THz. 

It is commonly stated that the maximum coverage range of a wireless 
communication system is longer in the low-band than in the high-band. This 
statement is often correct, but it is not caused by the phenomenon illustrated in 
Figure 1.4. Recall that we considered a free-space propagation model without 
objects between the transmitter and receiver, where the power flux density is 
independent of frequency. The differences in the free-space channel gains in 
Figure 1.4 can be fully compensated for by increasing the effective area of the 


4Two notable exceptions are free-space optical communication that uses visible or near-visible 
light and sonic communication that uses audio waves. 

5The convention of whether a frequency band is considered low or high shifts with time and 
application; in particular, the low-band for cellular communications is known as the ultra-high 
frequency band for radar, and some other radio applications [12]. 


1.1. Transmitted and Received Signal Power 9 


receive antenna; thus, it is the same irrespective of the signal’s frequency. In 
particular, the channel gain definition in (1.7) becomes frequency-independent 
if the area in the numerator is constant instead of proportional to A. Since the 
effective area of a single receive antenna reduces with increasing frequency, a 
fair comparison between two frequency bands requires antenna configurations 
with the same effective area in both bands. One way to achieve this in practice 
is by using multiple receive antennas in the higher band so that their collective 
effective area sums up to the same value as in the lower band. We will consider 
this in detail later in this book. 

The main reason low-band frequencies generally have a longer coverage 
range is the signal behaviors in scenarios other than free-space propagation. In 
terrestrial communications, there are many objects in the environment around 
and between the transmitter and receiver. Signals with a lower frequency 
range propagate better through and around such objects and are reflected off 
walls more favorably. The signal absorption by atmospheric gases in the air 
also increases with the frequency. For these reasons, base stations for wide- 
area coverage typically use the low-band, while medium-range and local-area 
networks use the mid-band. Short-range networks might use the mmWave 
spectrum (or even the THz spectrum) in the high-band. Nevertheless, satellites 
commonly use the high-band spectrum to communicate with the ground over 
incredibly long distances. This works well if no blocking objects exist and the 
antennas have large effective areas. 

Despite the reduced range, there are two good reasons why new wireless 
communication systems are gradually supporting higher frequency bands. 
Firstly, large parts of the low-band and mid-band are already occupied by 
existing wireless services, making it hard to launch new services there. Secondly, 
there is generally more bandwidth available at higher carrier frequencies, and 
we will see later that the data rates increase with the bandwidth. To give 
some indicative numbers, a network operator might have licenses for 20 MHz 
in the low-band, 100 MHz in the mid-band, and 1 GHz in the high-band. 

The channel gain depends on the propagation distance d in typical terres- 
trial communication scenarios, where the transmitting base station might be 
deployed on a rooftop and the receiving user device is located in an urban city. 
In that case, there is no unequivocal channel gain model because the wave 
propagation depends on the exact geographical locations of buildings and 
other large-scale objects. However, we can describe the average propagation 
conditions by fitting a parametric channel gain model of the kind 


p=(1)" as) 


to real-world channel measurements. The parameter a is called the pathloss 
exponent while Y is the channel gain at a 1m reference distance. This para- 
metric model is inspired by the free-space model in (1.7), which is obtained 


by a = 2 and T= ey = (2:3SH2)? because c/(1m) = 0.3 GHz. 
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Example 1.4. The 3GPP technical report [6] presents channel gain models for 
several propagation scenarios typical in cellular communications. For example, 
in the non-line-of-sight urban microcell (UMi) scenario [6, Table B.1.2.1-1], 
the channel gain is modeled (in decibel) as 


Bui = —36.7 logs, (<) — 22.7 — 26 l0g}o (rd) dB. (1.10) 
This model can be used for distances d in the range 10-2000 m and frequencies 
f in the range 2-6 GHz. What are the values of œ and Y in this case, and 
how does it differ from the free-space propagation case? 

The distance-dependent term in (1.10) is —36.7 logyg(d) = —10 logy,(d*°"); 
thus, the pathloss exponent for this UMi channel is a = 3.67. Since the 
exponent is larger than in free-space propagation (œ = 2), the channel gain 
decays more rapidly with the distance. This represents the fact that the 
wireless signals must interact with objects in the environment to reach the 
receiver. 

The channel gain Y at the reference distance of 1m is given by the last 
two terms in (1.10) and becomes T = 10~2-27(1GH2)?°. This parameter is 
valid for specifying the pathloss model even if the UMi model should only be 
used for d > 10m. We notice that Y decays with the frequency as f~?-°. This 
is faster than the f~? behavior in free-space propagation, which is caused by 
the isotropic receive antenna assumption. The extra decay describes how the 
wireless signals interact less favorably with objects as the frequency increases. 

Apart from the scaling behaviors, we can compare the channel gains 
obtained at the minimum values d = 10m and f = 2 GHz. The channel gain is 
—67.2dB with the UMi model and —58.5 dB in free-space propagation; thus, 
the UMi model consistently gives lower gains at all distances and frequencies. 


1.1.1 Signal-to-Noise Ratio 


Although the channel gains are typically tiny in wireless communications, many 
existing systems operate efficiently. This is possible because what matters 
is not the absolute amount of signal power received but its relative size 
compared to the noise power in the receiver hardware (and the interference 
power received from other concurrent transmissions). 

We let o? denote the noise power. It is computed as the product o? = No B 
of the noise power spectral density No W/Hz and the signal bandwidth B Hz. 
The intuition behind this model is that the thermal noise in the receiver is a 
white random process with the constant power spectral density No over all 
frequencies, but the receiver hardware filters out the noise that lies outside 
the signal band, thereby making the total noise power equal to No times 
the signal bandwidth B. We will return to these modeling assumptions in 
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(a) The SNR when using the free-space channel gain model in (1.7). 
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(b) The SNR when using the UMi channel gain model in (1.10). 


Figure 1.6: The SNR in (1.13) as a function of the propagation distance d for two different 
channel gain models: free-space propagation and the non-line-of-sight UMi model. The setup is 
defined by f = 3 GHz, B = 10 MHz, and either P= 10W, P=1W, or P=0.1 W. 


Section 2.3.2. The noise power spectral density depends on the temperature, 
but the variations are small in most use cases. Therefore it is common to take 
the number at room temperature (i.e., 20°C) and treat it as a constant:° 


No = 10-704 W/Hz. (1.11) 


6The actual noise power spectral density in wireless receivers is normally larger than the 
number in (1.11) since the receiver hardware is amplifying the thermal noise. For example, the 
practical noise power might be 4-8 dB higher than the theoretical lower limit in (1.11). 
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When reporting noise powers in the decibel scale, using dBm, the formula is 


i NoB 
o° = 10 logig 


0 e | 
a = —174 + 10log,)(B) dBm. (1.12) 


The signal-to-noise ratio (SNR) is defined as 


P _ PB 

SNR = — = F (1.13) 
where we recall that P is the transmit power, 8 is the channel gain, and 
a? = NoB is the noise power. The SNR is a dimensionless variable since it 
is computed as the ratio of two powers. To get a sense of what the practical 
range of SNR values is, Figure 1.6 shows the SNR in (1.13) in the decibel 
scale as a function of the propagation distance. We consider a bandwidth of 
B = 10 MHz around the frequency f = 3 GHz and use either the free-space 
channel gain model in (1.7) or the UMi channel gain model in (1.10). The 
SNR can be many tens of decibels for very short distances (e.g., inside a 
room). For practical distances in outdoor scenarios, we can expect an SNR 
below 40 dB, particularly when using the non-line-of-sight UMi model, where 
the channel gain decays more rapidly with the distance. If we reduce the 
transmit power, the SNR curve is shifted downwards accordingly. Many other 
phenomena affect the SNR, but as a rule-of-thumb, the SNR in a wireless 
communication system is between —10dB and +40dB. 


Example 1.5. Consider a communication setup where the SNR is 30dB at a 
400 m distance from the transmitter when using the free-space channel gain 
in (1.7) with f = 3 GHz (i.e., A = 0.1m). What will be the new SNR at that 
distance if we switch to using the UMi channel gain model in (1.10)? 

Due to the linear relation between SNR and the channel gain in (1.13), 
the SNR in the modified UMi setup is 


_ Pum _ PÊ bumi 
NB MB B 


SNRumi = 30 + 10 log, (=a) dB, (1.14) 


6 


where $ is the free-space channel gain from (1.7) and Bumi was defined in 
(1.10). By inserting numbers into this expression, we obtain 


0.1 
N i = 30 — 36.71 400) — 22.7 — 261 — 201 AN 
SNRumi = 30 — 36.7 log; (400) Ue Ne a) = ALI i a) 
x~ —6.58 dB. (hip) 


This new SNR is 36.58dB smaller (i.e., 4550 times smaller), which shows 
that the SNR can vary greatly with the propagation conditions. Such large 
variations can hardly be compensated for by increasing the transmit power. 
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Figure 1.7: Since the base station (to the left) and the phone (to the right) have different 
transmit powers, the areas where the SNR is above the minimum threshold that enables successful 
communications will be different in the downlink and uplink. One way to deal with this problem 
is to reduce the bandwidth in the uplink so that the SNR becomes the same as in the downlink. 


As mentioned earlier in this chapter, the transmit power can vary signifi- 
cantly between different devices, including those communicating with each 
other using the same communication standard. In a cellular network, the base 
station might transmit with 40 W, while the cell phone uses 0.1 W. This is a 
difference of 40/0.1 = 400 ~ 26 dB, which implies that the SNR is 26 dB better 
when transmitting in the downlink (from the base station to the phone) than 
when transmitting in the uplink (from the phone to the base station) over the 
same frequency band. It is necessary to communicate in both directions to 
keep a cellular network operational, which makes the uplink transmission the 
weakest link. A practical solution to this problem is to utilize only a fraction 
of the bandwidth when the user transmits, which increases the SNR since the 
noise power reduces. In other words, we put all the signal power into a narrower 
range of frequencies. This principle is illustrated in Figure 1.7 by showing 
the geographical area where a receiver would get an SNR above a certain 
threshold required for successful communication (e.g., —10 dB). The yellow 
area for the downlink transmission with B = 10MHz contains the phone; 
thus, the downlink transmission will be successful. However, the red area for 
the uplink transmission is substantially smaller and does not contain the base 
station. The yellow and red areas use the same bandwidth of 10 MHz in the 
uplink and downlink. However, if the phone only uses 10 MHz/400 = 25 kHz 
of bandwidth, the blue uplink area is obtained, and it is as large as the yellow 
downlink area. In practice, the bandwidth that is used by the phone can be 
varied dynamically depending on how far from the base station the user is. 

Another solution is to use different frequency bands in the uplink and 
downlink. Suppose the base station and phone can use both the low-band and 
the mid-band. It is then possible to let the phone transmit its signals in the 
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low-band where the range is longer and there is less bandwidth, while the 
base station transmits in the mid-band where the higher power compensates 
for a shorter range and broader bandwidth. The 5G NR standard supports 
this solution to enhance the coverage range of base stations. When wider 
bandwidths in the mid-band (or high-band) are utilized only for downlink 
transmission, the downlink data rates will be substantially higher than the 
uplink data rates. 


Example 1.6. Consider a phone that transmits 200 mW and that is connected 
to a communication system with a bandwidth of B = 20 MHz. When using the 
entire bandwidth, the uplink SNR is —30 dB. Suppose the uplink SNR must 
be at least —10dB for the system to be operational. How much bandwidth 
can the phone use? 

The phone must reduce the uplink bandwidth so that the SNR increases 
by —10 — (—30) = 20dB, which is 100 times more. Hence, at most, it can use 
an uplink bandwidth of 20 MHz/100 = 200 kHz. 


1.1.2 Fraunhofer Distance 


The analysis has thus far been based on isotropic antennas, which is a hy- 
pothetical concept, as noted earlier. This book is not focused on antenna 
design or detailed modeling of individual antennas but on the phenomena, 
benefits, and challenges that occur when having multiple antennas. However, 
we will briefly describe a few fundamental antenna properties essential to un- 
derstanding the connection between fixed directive antennas and the adaptive 
directivity obtained using multiple antennas. 

When we derived the channel gain equation for free-space propagation, 
we used Figure 1.1, where the receive antenna is located on the surface of 
a sphere because the transmitted signal spreads out as a sphere with an 
increasing radius. This implies that the receive antenna must be curved to 
fit on the surface area; otherwise, the transmitted signal will reach different 
parts of the antenna at different times. Practical antennas are generally flat, 
creating a mismatch that we will now analyze in detail. Figure 1.8 shows a flat 
receive antenna perpendicular to the direction of the propagating wave. When 
the spherical wavefront of the transmitted signal reaches the center of the 
receive antenna, it has not yet reached its edges. As a result, the impinging 
electric field will vary in phase and amplitude over the antenna surface. This 
has consequences for the intercepted signal power, which can typically be 
computed by integrating the power flux density of the impinging electric field 
over the receive antenna’s surface. The maximum power is intercepted when 
the impinging electric field is constant over the antenna, which happens in 
the ideal case when the wavefront is planar and impinges perpendicularly. 

When the propagation distance is sufficiently large compared to the antenna 
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Figure 1.8: When a spherical wavefront approaches a flat receive antenna, there will be a 
delay between when the wave reaches the antenna’s center and edge. This delay (or difference 
in propagation distance) turns into a phase-shift. The phase-shift is small or large depending on 
the relation between the distance d, the width a of the receive antenna, and the wavelength 4. 


size, the spherical wavefront can be locally approximated as planar when 
considering the power the antenna intercepts. If the distance is d from the 
transmitter to the antenna’s center and the antenna’s width is a, then we can 
compute the distance d’ from the transmitter to the antenna’s edges using 
the Pythagorean theorem as 


d = (e+ (E) =a/1+ (3) - (1.16) 


When a sinusoidal signal with the wavelength A needs to travel an extra 
distance d’ — d to reach the edge, then there will be a phase difference of’ 


OT 4 20 ey _ an a? = Ta? 
W-s (a: ! (55) a) = (a- < a) = [rad] 
(1.17) 

between the signal received at the edge and the center. The simplified expres- 
sion in (1.17) is obtained by using the Taylor approximation V1 + z? ~ 1+ = 
which is tight (the error is less than 0.05%) for 0 < x < 0.25. Since x = a/(2d) 
in this case, x < 0.25 implies we need to consider distances d > 2a. The phase 
difference in (1.17) will never be zero, but it will be close to zero when the 
propagation distance d is much larger than the width a of the antenna. It 
is common to assume (somewhat arbitrarily) that the phase variations over 
the antenna can be neglected if the maximum difference in (1.17) is no larger 
than 7/8 radians (22.5 degrees) [14]. By following this convention, we get the 
relation 


2u = aA (1.18) 


The impinging wavefront also varies in amplitude between the center and 
the edge since the received signal amplitude is inversely proportional to the 


T Suppose the signal sin(27ft) is transmitted in Figure 1.8. The signal reaching the center 
of the antenna is sin(2r f(t — d/c)), while the signal reaching the edge of the antenna is 
sin(27 f(t — d’/c)). The phase difference between these signals is 27 f(d’ — d)/c = 2n(d’ — d)/X. 
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distance. The relative difference is d'/d and this ratio is between 0.97 and 1 
for distances d > 2a, because d = 2a gives 


d d < 2a = /16 

To yea ya TM 
Hence, if the distance between the transmitter and receiver is simultaneously 
greater than 2a?/\ and 2a, we can neglect the spherical shape of the waveform 
(when considering both the phase and amplitude) and compute channel gains 
in the way previously described. In other words, we can treat the impinging 
wave as a plane wave traveling in one angular direction and only depends on 
time and the location along that direction; at any time instance, the wave 
is constant within any given plane perpendicular to the direction of travel.® 
The impinging wave is only approximately plane at the local level, observable 
at the receiver, but remains spherical at the global level. This is similar to 
how Earth appears flat to an observer on the ground, although it is curved. 

The minimum distance in (1.18) is called the Fraunhofer distance and 
is named after Joseph von Fraunhofer, who studied many electromagnetic 
phenomena. It is occasionally also called the Rayleigh distance. The region 
that lies beyond the Fraunhofer distance is known as the far-field of the 
antenna. The Fraunhofer distance was derived based on two approximations 
but is known to be a good rule-of-thumb. When the propagation distance d 
is either smaller than 2a?/X or 2a, we are in the near-field of the antenna. 
The near-field can be divided into two parts. The radiative near-field is an 
intermediate region where the propagation distance to the receiver is too short 
to neglect the phase and/or amplitude variations over the receive antenna but 
large enough to avoid direct hardware interaction between the transmitter 
and receiver. The reactive near-field is closest to the transmitter and includes 
additional electromagnetic effects such as evanescent waves and magnetic 
induction. These are examples of electric and magnetic field components that 
can only be observed near the transmitter, typically up to a maximum distance 
of A/(27). Specific standards exist for near-field communication (NFC) that 
are commonly used by smartphones and cards to enable short-range payments 
and identification. This book, which focuses on radiated electromagnetic waves, 
will not cover these technologies. 

To shed light on how far away the far-field is, suppose the receive antenna 
in Figure 1.8 has the length a = A for which 2a?/\ and 2a are both equal to 
2A. Hence, if the receive antenna is at least 1m from the isotropic transmit 
antenna, we are guaranteed to be in the far-field for any frequency band 
of interest in wireless communications (because the wavelength is typically 
shorter than 0.5m). This condition is almost always satisfied. 


~ 0.97. (1.19) 


8An ideal plane wave fills the infinitely large three-dimensional world (i.e., R3) and, thus, 
cannot exist in practice. However, the impinging wave observed over an antenna of finite width 
a will be perceived as being a finite-sized portion of a plane wave when d > 2a?/). 
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The Fraunhofer distance in (1.18) is truly wavelength-dependent, in con- 
trast to the free-space channel gain in (1.7) whose wavelength-dependence 
was caused by the assumption of having an isotropic receive antenna. The 
distances d and d’ in Figure 1.8 are computed based on geometrical arguments 
that do not involve the wavelength A. However, when the wave travels the 
extra distance d’ — d to reach the edge, the wavelength determines how large 
the resulting phase-shift is. For a fixed-sized antenna, the Fraunhofer distance 
in (1.18) is inversely proportional to A, making it larger in the high-band than 
in the low-band. However, suppose the antenna size is proportional to the 
wavelength. In that case, we get the opposite behavior, as shown by the fact 
that a = A gives the Fraunhofer distance 2X proportional to À. 


Example 1.7. What is the Fraunhofer distance when considering a rectangular 
receive antenna with width a and height b? 

Suppose d is the distance from the transmitter to the center of the antenna. 
Following the same steps as before, we can compute the distance d’ to the 


antenna’s corners as 
De DN 
= 1 — 1.20 
G va ie) ce 


e- yar) 


where we have defined D = Va? + b? as the length of the diagonal of the 
rectangular antenna. The difference d’ — d leads to the phase difference 


2T Dye Di 2T D? rD? 
x De d 1+ (3) d ~ (a 3d d) = [rad] 


(1.21) 
between the signals captured at the center and the corners, using the same 
Taylor approximation as in (1.17). We recall that the Fraunhofer distance 
is obtained when the phase difference is 7/8. Solving ae = 3 for d yields 
2D?/. The only difference from (1.18) is that D has replaced a. Generally 
speaking, for any antenna shape, the Fraunhofer distance is 2D?/A by letting 
D be the largest distance between any two points on the antenna. 


1.1.3 Antenna Directivity Gains 


We will now move beyond isotropic antennas and provide the basic charac- 
terization of antenna directivity. Practical transmit antennas radiate a larger 
fraction of their power in some angular directions than others. The transmitted 
signal will still propagate as a sphere with an expanding radius, as illustrated 
in Figure 1.1, but the signal power is unequally distributed over the surface 
area. We need a spherical coordinate system to specify the power distribution 
over the sphere. There are different ways to define spherical coordinates. We 
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use the definition in Figure 1.9 where a point at a distance d from the origin 
is characterized by the azimuth angle y € [—7,7) in the ry-plane and the 
elevation angle 0 € [—1/2,7/2]. Any point in the three-dimensional world can 
be uniquely described using either conventional Cartesian coordinates (x, y, z) 
or the spherical coordinates (d, p, 0). The one-to-one mapping between these 
coordinate systems can be defined as 


z cos(y) cos(@) 
y| =d |sin(y) cos(6) | . (1.22) 
z sin() 


This relation makes it easy to compute the Cartesian coordinates (x,y,z) of 
a point that is specified in spherical coordinates. The opposite transforma- 
tion involves inverse trigonometric functions, and we must be careful when 
computing the azimuth angle so it is not shifted incorrectly by +7. 


Example 1.8. How can the point with the Cartesian coordinates (x, y, z) = 
(3,4,5) be expressed using the spherical coordinates (d, p, 0)? 
Using the relations in (1.22), we first obtain that 


ae ee E= d? ( cos*(y) cos? (0) + sin? (4) cos? (0) + sin?(6)) =, (L) 
—— H S S 


cos? (0) 


Hence, we have d = \/z? + y? + z? = V3? + 4 +5? = 5V2. By using (1.22), 
we can further notice that 
y _ dsin(p) cos(6) 
x  dcos(y) cos(@) 


=tani@), (1.24) 
We know that y € [—7/2,7/2] since x is positive; thus, we obtain p = 
arctan(4/3) radians when solving for y. Lastly, we note that 
z n dsin(0) 
V dy/cos?(p) cos2(9) + sin? (p) cos?(0) 


=tan(6), (1.25) 


where we have utilized that cos(@) > 0 for 6 € [—71/2, 7/2]. By solving for 6, we 
obtain 0 = arctan(5/5) = arctan(1) = 7/4 radians. In summary, the spherical 
coordinates of the given point are (d,y,0) = (5V2, arctan(4/3), 7/4). 


When transmitting with power P, the signal intensity at the point (d, p, 0) 
is determined by the general power flux density function U(P,d,y, 0) mea- 
sured in W/m?. We will only consider the far-field (i.e., d larger than the 
Fraunhofer distance) because then the angular distribution over the sphere 
is approximately constant when we change the radius. This is not the case 
in the near-field for various electromagnetic reasons. In the far-field, we can 
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Figure 1.9: The directivity gain of an antenna is described using spherical coordinates. A 
location on the surface area is determined by the distance d, the azimuth angle » € [—7,7), 
and the elevation angle 6 € [—1/2, 7/2]. 


decompose the power flux density function as 


P 
U(P, d, p,) = And? G(y, ) , (1.26) 
so ——’” 


Average power density Antenna gain 


where the first term is the average power flux density at the given distance d in 
W/m? (i-e., the transmit power divided by the surface area) and G(¢, 0) is the 
antenna gain function. The antenna gain function describes how the radiated 
power is distributed over azimuth angles y € [—7,7) and elevation angles 
6 € [—1/2,7/2]. A lossless isotropic antenna is represented by G(y, 0) = 1 for 
all angles, often reported using the decibel scale as 0 dBi, where dBi stands 
for decibels-isotropic (i.e., the gain relative to an isotropic antenna). 

Any practical antenna has a varying antenna gain function larger than 
0 dBi for some angles and smaller for others. However, the average antenna 
gain is identical to an isotropic antenna. This implies that all antenna gain 
functions for lossless antennas must satisfy the condition® 


1 T m/2 
a J J Geaa = 1, (1.27) 
AT J-r —r/2 


where 47 is the surface area of the unit sphere and cos(#)0@0y is the area of 
a surface element in the direction (vy, 0) that appears when integrating over a 
sphere using spherical coordinates. The cosine-term represents the fact that 
there is less area near the north/south poles than along the equator. 


9Power losses appear in practical antennas, in which case the left-hand side of (1.27) becomes 
smaller than one. 
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Example 1.9. To examine how the formula (1.27) is derived, we consider an 
isotropic lossless antenna in the origin that transmits with power P. What is 
the total power reaching the surface of a sphere with radius d? 

The power flux density function is U (P, d, y,@) = soe with an isotropic 
lossless transmit antenna. By integrating over the surface area of a sphere 
with radius d, we obtain the total power as 


= ji : 
T JJ) Jerr- 4m (a? + y? + 2?) 


Prot (d) OxOYyOz. (1.28) 


It is convenient first to transform the Cartesian coordinates into spherical 
coordinates to evaluate the integral. The integral in (1.28) then becomes 


P w2 Gl 
Pol == ff rae HG, 0) DI aga, (1.29) 


where there is no integral with respect to the distance since all points on the 
sphere have the same distance d. The Jacobian determinant J(d, y, 8) appears 
due to the change of variables and is computed based on (1.22) as 


Odcos(y) cos(@)  Odcos(py)cos(9) Adcos(y) cos(4) 


MON CIO) Sle) Edm) 
I(d,p,0) = det g) p 2) 


Oy o 
Odsin(9) Od sin(@) Od sin(@) 
Od 3p 30 


cos(y) cos(f) —dsin(y) cos(@) —dcos(y) sin(A) 
= det | Ko cos(0) dcos(p)cos(0) —dsin(y) 7 
sin() 0 dcos(@) 
=P ( cos? (p) cos? (0) + sin? (p) cos? (0) 
cos3 (0) 
+ sin? (p) cos(0) sin? (0) + cos? (p) cos(0) sin? (0) ) 
ee Iial 
cos(@) sin? (8) 


= d’ cos(0). (1.30) 


After inserting J(d, p, 0) into the integral in (1.29), we obtain 


T T/2 
Pe L l L „5008p = P. (1.31) 


This is equivalent to (1.27) for G(%,0) = 1, which is the gain of a lossless 
isotropic antenna. If we consider an arbitrary lossless antenna, its gain function 
G(y, 0) also appears inside the integral, and we thereby obtain the general 
condition in (1.27) for preserving the total transmit power. 
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The antenna gain function G(y,0) provides a complete description of the 
angular variations in antenna gain. However, if the antenna is rotated perfectly 
towards the receiver, it is sufficient to know the maximum gain 


Gmax = max G(y¢, 6). (1.32) 
~,O 


This value is typically used when categorizing and comparing practical an- 
tennas. It is particularly common to represent the maximum gain in decibel 
scale as 


10 logio (Gmax) = max 10 logi9 (G(y,6)) [dBi]. (1.33) 
P, 


A simple example of a non-isotropic antenna gain function is 


4cos(y)cos(0), if p € [—2/2, 7/2], 0 € [—1/2, 1/2], 


(1.34) 
0, elsewhere. 


G(y, 4) = 
This antenna concentrates the radiated power in the direction y = 0 = 0 
where the maximum antenna gain is Gmax = 4, which is usually reported 
as 10 log,)(4) ~ 6dBi. When varying the azimuth angle, the gain reduces as 
cos() and reaches zero at y = +r /2. The gain value is zero for y € [—7, —7/2] 
and y E [7/2,7], which effectively means that the antenna only radiates 
into one half-space. The gain variations are similar in the elevation domain. 
In practice, this behavior can be achieved by a microstrip patch antenna, 
consisting of a metal patch printed on a substrate that acts as a reflecting 
ground plane. The maximum gain is then obtained perpendicularly to the 
patch while there is (ideally) no signal radiated at the backside. Patch antennas 
are extensively used in both mobile phones and base stations, thanks to their 
compact size and weight. Exact antenna gain models can be found in textbooks 
on antenna theory [8, Ch. 14], but (1.34) serves as a basic abstraction that 
we call the cosine antenna. 


Example 1.10. Verify that the cosine antenna satisfies the lossless antenna 
condition in (1.27). 
Direct computation based on the antenna gain expression in (1.34) yields 


m/2 T/2 

) cos(0)ð0ð p = — 4 cos(w) cos 000 

Ar BEG (8) = An Jinja J—x/2 (9) a) 7 
T/2 

= e cos(y > cos*(9)00 = 1. (1.35) 
T J—r/2 —r/2 
Sy 
=2 E2 


The antenna gain function of the cosine antenna is illustrated in Fig- 
ure 1.10, where its values are plotted over the surface of a unit sphere. The 
pattern illustrates how the radiated power is distributed over different angular 
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Figure 1.10: The antenna gain of the cosine antenna in (1.34) is plotted over the unit sphere. 
The pattern shows how the radiated power is distributed unequally over the angular directions, 
with the maximum appearing at (x,y,z) = (1,0,0). The color shows the antenna gain in the 
decibel scale compared to an isotropic antenna. 


directions. The power is concentrated over half of the sphere and maximized 
at its center. The maximum value is 6 dBi, while the average value is 0 dBi, 
as is the case for all lossless antennas. 

Figure 1.11 compares the antenna gain functions of a cosine antenna and 
an isotropic antenna for 0 = 0 and different values of the azimuth angle y. 
The isotropic antenna has a constant gain value of 0 dBi, while the gain of the 
cosine antenna ranges from 6 dBi to zero (—oo dBi). The total transmit power 
is the same for both types of antennas, but the cosine antenna concentrates 
the radiated power in specific directions. This means that a receiver located 
in that direction will receive a stronger signal than when using an isotropic 
antenna. Receivers in other directions will receive less power, and those at the 
backside of the antenna receive nothing. Hence, depending on the receiver’s 
location, the antenna gain variations can be either a benefit or a drawback. 
Receivers located in directions where the curved solid curve in Figure 1.11 
is above the dashed line will experience signal amplification compared to an 
isotropic transmit antenna. 

Antennas are reciprocal by nature, which means that the same antenna 
gain is achieved when transmitting to a receiver in the direction (y, @) and 
when receiving a signal from that direction. Recall that the antenna gain 
describes how much stronger/weaker the signal power is compared to the 
reference case with an isotropic antenna. We stated in (1.3) that the effective 
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Figure 1.11: The antenna gains observed in different azimuth angles y € [—7,7) for the 
elevation angle 0 = 0, when using the cosine antenna from (1.34) or an isotropic antenna. 


area of a receiving isotropic antenna is \?/(47). Hence, if the antenna gain 
function is G(y,@) for another type of antenna, its effective area will be 


2 
when receiving a signal from direction (y, 0). 

To emphasize the relation between the antenna gain and effective area, we 
return to Figure 1.3, which considered a receive antenna with the physical area 
A that receives a signal from the azimuth angle y. We previously concluded 
that its effective area is Acos(y) for y € [—7/2,7/2], but we implicitly 
assumed the elevation angle was zero. When considering both angles, the 
effective area becomes A(y, #) = Acos(y) cos(@) by the same arguments. If 
we further assume (for the sake of argument) that the physical antenna area 
is A = 4Åiso = a then the relation in (1.36) between the effective area and 
antenna gain becomes 


* cos(y) cos(@) = X a(y.8) => G(y,0) =4cos(y)cos(@) (1.37) 


for y € [—1/2, 2/2] and 6 € [—7/2, 7/2]. This result coincides with the cosine 
antenna in (1.34). Hence, we have found a way to tie the concepts together: 
A patch antenna with a physical area that is 4 times larger than Aiso has a 4 
times higher maximum gain. The gain function varies according to a cosine 
pattern since the patch looks smaller from non-perpendicular viewing angles. 
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Example 1.11. There are many other cosine-type radiation patterns in the 
field of antenna design than the one defined in (1.34). As an example, consider 
the gain function 


ccos(3y) cos(), if y € [—7/6, 1/6], € [—1/2, 7/2], 


(1.38) 
0, elsewhere. 


G(y, 0) = 


If this antenna is known to be lossless, what should be the value of the scalar 
c > 0? What is the maximum antenna gain? 
The left-hand side of the lossless antenna condition in (1.27) becomes 


n/6 m/2 
af i G(y, 0) cos(@)000y = Si / cos(3¢) cos?(0)000~ 
=r J—r/2 An T/6 m/2 
m/2 
IL. 
ae F cos(39)00 | cos*(0)00 = = (1.39) 
—— amama — 


—= 
=2/3 =n/2 


We notice that this value only becomes 1 if c = 12. The maximum antenna 
gain is achieved in the direction y = 0 = 0 and is Gmax = c = 12. 


1.1.4 Revisiting the Signal-to-Noise Ratio 


We will now revisit the SNR calculation and aap arbitrary antenna gains. 
The SNR was defined in (1.13) as SNR = Hh and depends on the channel 
gain 8. The channel gain in free-space propagation with isotropic transmit 
and receive antennas was computed in (1.7) as an where d is the distance. 
We can generalize this expression for arbitrary antennas as [9] 


A2 


B= (47d)? 


Gt (Ye, %)Gr (Yr, Or), (1.40) 
where Gi(y, @) is the antenna gain function of the transmitter and G,(y, 0) 
is the antenna gain function of the receiver. These functions are defined for 
an arbitrary azimuth angle y and elevation angle 0, but the functions are 
evaluated in (1.40) for the angles (p+, 0+) at the transmitter that lead to the 
receiver and the angles (y,,6,) at the receiver that lead to the transmitter. 
Figure 1.12 illustrates this setup and, particularly, makes the point that the 
transmitter and receiver measure the angles based on their local coordinate 
systems. The antenna gain functions can then have their peak values at 
y = 0 = 0, irrespective of how the transmitter and receiver are rotated with 
respect to each other. 

It might seem strange to call (1.40) the channel gain when it also contains 
the antenna gains at the transmitter and receiver. However, this is unavoidable 
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Figure 1.12: The transmitter sees the receiver in the angular direction (yt, 0+), measured 
using the transmitter’s local coordinate system. The receiver sees the transmitter in the angular 
direction (yr, r), measured using the receiver’s local coordinate system. These angles can be 
used when evaluating the antenna gains in (1.40). 


since the effective area of the receiver always determines the fraction of the 
transmit power that is received, even when the transmitter is isotropic. One 
must always make assumptions regarding the antenna gains to compute a 
channel gain. Hence, the channel starts at the input to the transmit antenna 
and ends at the output from the receive antenna. 

The channel gain in (1.40) is an increasing function of the antenna gains 
Gil pr, 04) and G,(yr,6,), which gives the impression that it is preferable 
to have strongly directive antennas in wireless communications. This is a 
valid conclusion for fixed wireless links where the person that deploys the 
transmitter and receiver can rotate the antennas so that the maximum gains 
are achieved precisely at the angles (p+, A) and (pr, 0r). This is the case for 
links between a geostationary satellite and receivers on the ground (e.g., using 
parabolic dish antennas to receive television broadcasts) or for fixed wireless 
broadband links where the customer has a fixed receive antenna at the outside 
of its house pointing towards the nearest base station. 

The situation is more complicated in mobile communications, as illustrated 
in Figure 1.13, where a rooftop-mounted base station serves Receiver 1 and 
Receiver 2. The receivers are mobile phones, and it is not reasonable to require 
the users to hold their phones in precisely the right directions all the time. 
Hence, nearly isotropic antennas are utilized in mobile devices to ensure that 
almost the same SNR is achieved irrespective of how the device is rotated. The 
transmitter in Figure 1.13 emits a signal with an antenna gain function that 
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Figure 1.13: An example of a mobile communication scenario where the transmitting base 
station has a directive antenna. The maximum antenna gain is achieved in the direction leading 
to Receiver 1. The path leading to Receiver 2 experiences a weak antenna gain. 


is illustrated to resemble that of a cosine antenna. Receiver 1 happens to be 
located in the direction with the maximum antenna gain. In contrast, Receiver 
2 is located behind a building and can only be reached if the wireless signals 
are reflected off another building, as indicated in the figure. This receiver will 
experience a low antenna gain since the transmitter’s gain function is low 
in the angular direction leading to the receiver. This example pinpoints the 
practical tradeoff between having a large maximum antenna gain and having a 
wide coverage area (wide enough to cover all prospective users) when selecting 
the antenna to be used at a base station. 

Ideally, we would like to rotate the antenna gain function depending on 
the receiver’s location, so we can always provide the maximum antenna gain. 
This could be achieved by mechanically rotating the base station antenna, but 
it is quite impractical since receivers can move rapidly. The preferred practical 
solution is to use multiple antennas to rotate the directivity of transmitted 
signals using the theory developed in later chapters of this book. 

The free-space channel gain in (1.40) can also be expressed in terms of 
the effective areas A (p+, 0+) and Ar(Yr, 0r). By using the relation stated in 
(1.36), an equivalent version of (1.40) is 


B = At (Pt, 0) Ar (Yr, 6.) 
E (dd)? 


(1.41) 


The impact of the antenna design and wavelength on the free-space channel 
gain can be understood by inspecting (1.40) and (1.41). If the antenna gains 
in (1.40) are constant as we reduce the wavelength A (i.e., increase the carrier 
frequency), then the channel gain 8 will reduce proportionally to \?. This 
reduces the SNR because the effective receive antenna area is reduced, so the 
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receiver captures less power. This is manifested by the relationship between 
area and gain in (1.36). On the other hand, if the effective areas in (1.41) are 
constant as we reduce the wavelength, then the channel gain 8 will instead 
increase proportionally to \~? when À is reduced. This results in an SNR 
improvement because the antenna gains are increased; that is, the antennas 
become more directive. This is beneficial if the transmit and receive antennas 
are aligned to deliver the maximum antenna gains to the communication 
system. In other words, the high-band can provide better channel conditions 
in free-space propagation than the low-band, if we compare two systems with 
equal-sized antennas that are perfectly aligned. This is one of the features 
that fixed wireless links rely on (e.g., communication with geostationary 
satellites). Using the high-band spectrum for mobile communications, where 
the physical directions of the devices’ antennas change over time, requires 
that the directivity can change accordingly to keep them directed toward the 
base station. We will explore how this is achieved using multiple antennas. 


Example 1.12. How does the SNR in free-space propagation depend on the 
wavelength if the base station has a fixed wavelength-independent effective 
antenna area A(t, 4.) while the user device has an isotropic antenna? 
The effective area of the isotropic receive antenna is A, (pr, 0r) = ae We 
can compute the SNR using (1.41) as 
P6 P A(pr, )Ar(Yr, Or) P Apt, 0i) 


SNR = = = , 1.42 
NB NoB (dX)? NoB 4nd? Co 


This expression is independent of A since the two wavelength-dependent effects 
are canceling out. The area of the receiver is proportional to ?, while the 
gain of the transmit antenna is obtained from (1.36) as Gi(y, 0) = $3 At(¢, 0), 
which is inversely proportional to \? when the area is fixed. Hence, if the 
wavelength shrinks, the receiver becomes physically smaller but captures the 
same signal power since the transmit antenna becomes more directive. The 
same principle applies when the device transmits, but then the radiated signal 
is isotropic and induces a frequency-independent power flux density on the 
fixed-area receive antenna. 


A general parametric channel gain model was defined in (1.9), as a function 
of the pathloss exponent a and the channel gain Y at a 1 m reference distance. 
The parameter values are normally stated for isotropic antennas but can be 
used along with other antennas by multiplying with the antenna gains: 


p=T (=) Gi (ve, 0) Gr(~r, Or). (1.43) 
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1.2 Three Main Benefits of Having Multiple Antennas 


This book will cover how using multiple antennas can improve the operation 
of wireless communication systems. We have already provided some hints of 
what the benefits could be in the context of mobile communications, where 
the location and rotation of the transmitter/receiver change with time. In this 
section, we will describe the three main categories of benefits that multiple 
antenna communication systems have over conventional systems with a single 
antenna at the transmitter and receiver. These benefits have been given several 
different names over the years. In this book, we call them: 


1. Beamforming gain; 
2. Spatial multiplexing; 
3. Spatial diversity. 


These benefits will be introduced below, including a short historical expose, 
and then covered in further detail in later chapters. 


1.2.1 Beamforming Gain 


The wireless telegraph was invented in the 1890s as the first system for 
wireless communications. The technology used Morse code to transfer words 
encoded as a sequence of “dots” and “dashes”, represented by transmitting 
sinusoidal signal pulses of two different durations. The wireless telegraph 
played an essential role during the First World War since it allowed for direct 
communication between continents [15]. The distance from North America to 
Europe is more than 5000 km; thus, if the channel gain is computed as in (1.7), 
it would be much smaller than the values shown in Figure 1.4. To reach over 
the oceans, the radio stations had to broadcast their signals with very high 
transmit power (tens of kilowatts). Therefore, researchers started to look for 
ways to achieve directive transmission and reception to reduce the transmit 
power or to reach even further distances with the same power. This was 
where multiple antenna communications appeared as a solution (in addition 
to using directive antennas). Guglielmo Marconi made the first transatlantic 
transmission in 1901 using two tall antenna poles in the United Kingdom 
[16]. Karl Ferdinand Braun did an experiment using three antennas in 1905, 
which he described publicly when he and Marconi shared the Nobel Prize in 
Physics in 1909 [17]. Ernst F. W. Alexanderson filed a patent application in 
1917 describing the first practical implementation of radio communications 
[18]. The patent did not use the term beamforming but outlined all the same 
benefits as will be described in this section. The implementation was analog 
then, while current systems are digitally controlled. Some early field trials for 
mobile communications in the 1990s are described in [19], [20]. 
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To exemplify the basic phenomenon that was discovered and utilized in the 
early 1900s, we consider the transmission of a time-limited sinusoidal pulse 


ms n if t € [0, T], 


; (1.44) 
0, otherwise, 


where f is the frequency and the time duration is T = 1/f, for some integer 
l > 0. This means the pulse consists of l full periods of the sine wave. The 
power of this pulse is 


Tv T 
F p (t)ðt = F sin? (2r ft)dt 
= = T Zar T a) =1, (1.45) 


where we utilize the trigonometric identity sin? (x) = (1 — cos(2x))/2 and 
notice that the last integral is zero since we integrate over 2l periods. 

The Morse code is transmitted using on-off keying, which means we switch 
between transmitting the sinusoidal pulse Ap(t) with an amplitude A > 0 and 
being silent. If we transmit the pulse with amplitude A, then the transmitted 
signal power is computed as 


rf (ae = Aa hf POs. (1.46) 


We notice that the signal power is proportional to the square of the pulse’s 
amplitude. The received signal at some destination will be /8Ap(t), where 
the channel gain 8 represents the signal propagation loss and can, for example, 
be computed as described in (1.7) for free-space propagation with isotropic 
antennas or in (1.43) for arbitrary antennas and propagation modeling. In 
any case, the received signal power is GA”, which is also proportional to A?. 

Suppose the received signal is too weak for the receiver to decode the 
Morse code accurately. If we want to increase the received signal power by 
100 times (i.e., 20dB), we can increase the signal amplitude by a factor of 10, 
from A to 10A. The transmitted signal power will then instead be 


T T 
al, (10Ap(t))? Ot = (oa) f p*(t)0t = 100A?. (1.47) 


This means we need to spend 100 times more transmit power to receive the 
signal \/810Ap(t) that contains 100 times more power. 

An alternative solution is to generate the original signal Ap(t) at 10 
different transmit antennas. Each signal has a power of A? so this approach 
requires a total transmit power of 


= 10A?. 1.4 
on fap 2 ot = 10 (1.48) 
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We can then radiate these signals simultaneously from the multiple antennas 
and let them add/superimpose constructively over the air. In this way, the 
received signal will also be /G10Ap(t), but we only need to spend 10 times 
more power, instead of 100 times more as in the single-antenna case. In other 
words, if the destination requires a specific received signal power level to 
decode the information successfully, we can satisfy that requirement using 
only 1/10 of the power when using 10 transmit antennas instead of one. 

In general, if we compare single-antenna transmission with transmission 
from M antennas, we can reduce the total transmit power by a factor of 1/M 
while keeping the received signal power constant. How is this possible? It 
might seem that additional signal power is “magically” created when the M 
transmitted signals are combined in the air. The simple yet physically accurate 
explanation is that the transmission becomes spatially directed toward the 
receiver. In other words, when observed at a distant receiver, the combination 
of M transmitted signals looks like the signal emitted from a single “virtual” 
antenna with high directivity; that is, a virtual antenna having an M times 
higher antenna gain than the individual physical antennas had. 

Figure 1.14 shows an array with M = 4 isotropic antennas deployed on a 
line. The adjacent antennas are separated by half-a-wavelength: A/2 = c/(2f). 
If all the antennas transmit the signal Ap(t) simultaneously, then each of the 
emitted signal components will radiate as in the single-antenna case described 
earlier. A superposition of the M signal components can be observed at every 
point in space. The components have, generally, traveled different distances 
to reach the considered point and, thus, are time-delayed differently. 

Let us consider points many wavelengths away from the array (i.e., in its 
far-field) so that the propagation distance is much larger than the distance 
between the individual antennas. For any such point on the horizontal axis 
in Figure 1.14, the distances to each antenna will be roughly the same. This 
can be understood by considering the triangle in Figure 1.15, which has 
corners at two different antennas and the considered receiver location along 
the horizontal axis. Hence, the M signal components will be approximately 
time-synchronized, and the received signal becomes M \/GAp(t). This is the 
constructive interference behavior that we are looking for. However, for any 
point on the vertical axis in Figure 1.14, the distances to the antennas differ by 
integer multiples of \/2. This distance difference remains even if the considered 
point of the receiver is far away. The corresponding time delay difference 
between two adjacent antennas is an integer multiple of T = = = afi which 
corresponds to a half period of the sine wave: 


sin (2r f(t — T)) = sin (2r ft — 7) = — sin (27 ft). (1.49) 
Hence, the signals emitted from two adjacent antennas cancel out along the 
vertical direction, called destructive interference. The horizontal and vertical 


axes represent the extreme cases, while partially constructive or destructive 
interference can be observed elsewhere, as indicated in Figure 1.14. 
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Figure 1.14: When transmitting the same signal from all the antennas in a one-dimensional 
array, the signal components will propagate time-synchronously in the direction perpendicular 
to the array, leading to constructive interference in the horizontal direction in this figure. On the 
other hand, the signals will propagate non-synchronously in other directions leading to partially 
constructive or fully destructive interference. 
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Figure 1.15: If the distance å/2 between the two transmit antennas is much smaller than the 
propagation distances to the receive antenna, then we have approximately the same distance d 
from both transmit antennas. If the antennas transmit the same signal, constructive interference 
will occur at the receiver. 
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Example 1.13. An array with M = 2 isotropic antennas is located at the 
Cartesian coordinates (0, +A/4,0) and (0, —A/4,0), where A is the wavelength. 
The sinusoidal pulse p(t) in (1.44) is transmitted from both antennas with 
the amplitude A/,/2, so the total transmit power is A?. What is the received 
power at a point with spherical coordinates (d, p, 0), assuming that d >> À and 
the channel gain 8 is the same from both transmit antennas to the receiver? 
We let dı and dz denote the distances to the receiver from the antennas 
at (0,+A/4,0) and (0, —2/4,0), respectively. The received signal becomes 


VBAsin (E (« *)) + \/BAsin (= (« = 2)) . (1.50) 


C 


By using the trigonometric identity sin(a) + sin(b) = 2sin (44?) cos (45°), 
(1.50) can be expressed as 


2,/BAsin (A = ~ ie d) cos ($ - dı)). (1.51) 


To determine its power, we need dı and dz. The Cartesian coordinates of the 
receiver is (dcos(y), dsin(y),0). Since d >> A, dı can be approximated as 


dı = Jeet = (I? sp (asino) — 5) = je ah j = 


. 2 . 2 . 
= ay) Asin(y) | A ee Asin(y) | A ee A sin(y) (1.52) 


2d ` 162 4 


by using that v1 + x ~ 1+5 for 0 < x < 1. Similarly, dz can be approximated 
as dg ~ d+ aeta, We can now approximate (1.51) as 


V2BAsin (= (« z =) V2 cos (3 sin(y)) ; (1.53) 
cae ee 


Received signal at distance d Angle-dependent multiplicative factor 
with a single antenna 


which is the product of the signal received with a single transmit antenna 
and an angle-dependent factor that describes the constructive/destructive 
interference. By integrating the square of (1.53) over one signal period and 
utilizing that fis 2sin?(27t)Ot = 1, we obtain the received signal power 


Pio = 2 cos? (5 sin(y)) BA’. (1.54) 


The largest power 28A? is achieved if y = 0 or y = 7, as in Figure 1.14, which 
is twice the received power compared to a single antenna using the same total 
power. Destructive interference occurs when Y = +7/2 since cos(+7/2) = 0, 
while half of the maximum power is received when y = 7/6. 
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Figure 1.16: The received signal power in different directions and distances when transmitting 
1W from an isotropic antenna. The color shows the received signal power in dBm when using 
(1.7) to compute the channel gain in free-space propagation with f = 3 GHz as the frequency. 


We will now illustrate the constructive, partially constructive, and destruc- 
tive interference behavior when using multiple antennas and compare it to the 
single-antenna transmission case. Figure 1.16 shows how the signal power from 
a single isotropic transmit antenna spreads out over an area of 200 x 200m. 
The transmitter is located in the origin and transmits a signal with 1 W of 
power. The color shows the received signal power in dBm, and we use the 
free-space model in (1.7) to compute the channel gain 8 at different distances. 
We notice that the signal power spreads out identically in all directions and 
decays with distance. If we rotate the figure around the origin, the pattern 
remains the same, as expected when using an isotropic transmit antenna. 

In contrast, a transmitter with an array of M = 10 isotropic antennas 
is considered in Figure 1.17. The antennas are deployed along the vertical 
axis with \/2 antenna spacing, as illustrated in Figure 1.14, and are centered 
around the origin. Exactly the same signal is simultaneously transmitted from 
all the antennas. Figure 1.17(a) considers the case when the total transmit 
power is 0.1 W (i.e., scaled down as 1/M), which leads to 0.01 W per antenna. 
Figure 1.17(b) considers the case when the total transmit power is 1 W (ie., 
the same as in the single-antenna case); thus, the power per antenna is 0.1 W. 
Although each antenna radiates its signal isotropically, the figure shows that 
the combined effect is a directive signal in the two horizontal directions. Hence, 
we create constructive and destructive interference patterns aligned with the 
previous discussion related to Figure 1.14. 

The constructive interference pattern in Figure 1.17 takes the shape of 
a beam (also known as a lobe), and the antenna array is therefore said to 
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(a) M = 10 transmit antennas with a total transmit power of 0.1 W (0.01 W per antenna). 
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(b) M = 10 transmit antennas with a total transmit power of 1 W (0.1 W per antenna). 


Figure 1.17: The received signal power in different directions and at different distances, when 
transmitting the same signal from M = 10 isotropic antennas located in the origin. The color 
shows the received signal power in dBm when using (1.7) to compute the channel gain in 
free-space propagation with f = 3 GHz as the signal frequency. 
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perform beamforming. There is a strong main beam along the horizontal axis, 
but also several side-beams pointing in other directions, usually referred to as 
side-lobes. By comparing Figure 1.16 and Figure 1.17(a), we can notice that a 
receiver located in the direction of the beam (i.e., along the horizontal axis) 
will receive the same power in both cases. However, the transmit power has 
been reduced with a factor 1/M in Figure 1.17(a) so we can deliver the same 
wireless communication service but save power. This is called an M-times 
beamforming gain or array gain. Receivers in other directions will receive less 
power when using multiple antennas because there is no magical appearance 
of signal power but only a power redistribution from some angular directions 
to other directions. In particular, no signal power is observed along the vertical 
axis. Hence, beamforming can be both a blessing and a curse—it is helpful if 
the main beam points in the direction preferred by the receiver and can be 
detrimental otherwise. This issue resembles that of using directive antennas 
(described earlier), but there is a crucial difference: an individual antenna has 
a fixed antenna gain function, while the direction of the beam from an antenna 
array can be controlled when using beamforming. The ability to change the 
direction is often seen as an inherent part of the beamforming concept but it 
has also been called adaptive beamforming. Various methods to point beams 
toward the desired receivers are developed later in this book. 


In Figure 1.17(b), the total transmit power is the same as in the single- 
antenna case. The received signal power for a user located along the horizontal 
axis is then M times stronger than in the single-antenna case. Hence, the 
beamforming gain provides a stronger received signal for users that the beam 
is pointed toward. There are many directions outside the main beam where 
less power is received than in the single-antenna case. 


The fact that beamforming distributes the transmit power unequally 
between different angular directions is illustrated in Figure 1.18, where a 
sphere is centered around the array. The color illustrates the received power 
level at different points on the sphere relative to the maximum value. The 
x-axis corresponds to the horizontal axis in the previous figures, the y-axis 
corresponds to the vertical axis, while the z-axis was not visible before. As M 
increases, the black stripe where the received signal power is high will contain 
a larger and larger fraction of the transmit power but also become narrower. If 
one would integrate over the sphere to sum up all the power, it would always 
be equal to the total transmit power irrespective of the value of M. 


In summary, the beamforming gain can be utilized to achieve an M times 
higher SNR than in the single-antenna case using the same transmit power, 
or it can be used to achieve the same SNR using M times less transmit power. 
Although the example above considers transmission from M antennas to a 
single-antenna receiver, the same gains can be achieved when transmitting 
with one antenna to M receive antennas. We will study this in detail later in 
this book. The beamforming distributes the transmit power unequally over 
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Figure 1.18: The normalized received power on different parts of a sphere centered around an 
array with M = 10 isotropic antennas, in the same setup as in Figure 1.17. The color shows 
the normalized received power in dB-scale where the maximum value is 0dB. All distances are 
normalized. 


different angular directions, similar to a single directive antenna (compare 
Figure 1.10 and Figure 1.18) but with the vital difference that the directivity 
of an antenna array can be changed, as described next. 


1.2.2 Spatial Multiplexing 


Many wireless systems have more than one user and must multiplex their 
communication services on the shared wireless channel. Traditionally, the 
users are multiplexed by assigning non-overlapping time-frequency resources; 
for example, different time intervals and/or frequency bands. The reason for 
this system design is to avoid interference. If two signals are radiated with 
equal power from an isotropic antenna at the same time and frequency, each 
signal will propagate isotropically as illustrated in Figure 1.16. At every point 
in space, a superposition of the two signals will be observed where the signals 
remain equally strong. Each receiver is only interested in one of the two signals. 
When measuring the corresponding communication quality, the ratio between 
the desired signal’s power and the summation of the interfering signal’s power 
plus the noise power is a common performance metric. This is known as the 
signal-to-interference-plus-noise ratio (SINR) and is a generalization of the 
SNR metric to situations with interference: 


Received signal power 


SINR = (1.55) 


Received interference power + Noise power’ 
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The SINR is always smaller than or equal to the SNR because we obtain 
the SNR by removing the interference from the denominator in (1.55). The 
interference is problematic when the SINR is much smaller than the SNR 
and might severely limit communication performance. For example, when the 
signal and interference powers in (1.55) are equally large, the SINR cannot 
surpass 1. In contrast, the SNR values exemplified in Figure 1.6 can be many 
orders-of-magnitude larger than one (e.g., 30 dB is 1000). This issue cannot be 
addressed using a directive transmit antenna since both signals will be radiated 
with the same directivity. The following example proves this mathematically. 


Example 1.14. Consider an isotropic antenna that transmits to two receivers 
with the same channel gain 8 € (0,1]. It assigns power P, > 0 to receiver 1 
and power P, > 0 to receiver 2. Suppose an SINR of 1 (i.e., 0dB) is needed 
for reliable communication. Is it possible to select the powers P) and P so 
that the transmitter can communicate to both receivers reliably? 

If we let o? > 0 denote the noise power, then we can use (1.55) to obtain 
the SINR achieved by the first receiver: 


P8 Ti 
SINR; = = : 1.56 
1 Pop +0? Py + = l ) 
Similarly, the SINR achieved by the second receiver is 
Pab P> 
SINR2 = = : 1.57 
2 PB + g? Pı + g? ( ) 


B 


For jointly reliable communication to the two receivers, both SINR; and SINR2 
must be greater than or equal to 1, which is equivalent to the conditions 


(1.58) 


(1.59) 


Since both inequalities require one power to be strictly larger than the other 
one, they cannot be satisfied simultaneously. This happens even if the channel 
gain is large, so ?/@ is small. Only in the hypothetical noise-free case of 
o?/@8 = 0 can reliable communication be guaranteed for both receivers. Even 
in that case, the common SINR cannot surpass 1. This is why single-antenna 
systems avoid interference by letting the users take turns communicating. 


Using multiple antennas fundamentally changes the situation since each 
radiated signal can have a unique spatial directivity. Recall that Figure 1.17 
illustrated a situation where a signal is focused along the horizontal axis, 
so the signal vanishes entirely along the vertical axis. Hence, a device that 
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is located in that direction will not observe any interference at all. With 
this phenomenon in mind, the concept of spatial multiplexing, also known 
as space-division multiple access (SDMA), was conceived in the late 1980s 
and early 1990s [21]-/24]. The key idea was to equip the base stations in 
cellular networks with multiple antennas and exploit beamforming to suppress 
interference between the users, thereby enabling efficient communications 
where multiple users are using the same time and frequency resource. The 
SDMA concept had been considered for satellite systems decades earlier [25]. 

Suppose p(t) is the signal transmitted to the receiver. When all the transmit 
antennas emit this signal simultaneously, a particular pattern of constructive 
and destructive interference is created, as exemplified in Figure 1.14. Other 
patterns can be generated by emitting different signals from the antennas; in 
particular, we can transmit a time-shifted copy of p(t), where we adapt the 
time-shift to obtain constructive interference in any direction or at any point 
of choice. The methodology of adaptive beamforming is to: 


1. Measure the propagation time delays 7,...,7,;¢ from each of the M 
transmit antennas to the intended receiver. 


2. Compensate for the time delays by transmitting the signal p(t) earlier 
from the more distant antennas in the array: x(t) = p(t + Tm) is the 
signal transmitted from the mth antenna. 


3. All the signal components arrive at exactly the same time at the intended 
receiver since the received signal is an attenuated version of 


ai(t—71)+...+am(t—7Tm) = p(t+m1—71)+...+p(t+ tm — Tm) 
= Mp(t). (1.60) 


Suppose we want to direct the signal towards a user located on the vertical 
axis in Figure 1.14 instead of the horizontal axis. Since the antennas are 
separated by a distance A/2, the geometry implies that each transmitted 
signal becomes time-shifted by half a period compared to the signal from the 
adjacent antenna. Hence, if we emit a signal already shifted by half a period, 
the two effects cancel out at every point on the vertical axis. The result is 
shown in Figure 1.19, where the main beams point along the vertical axis, 
while the signal components cancel out along the horizontal axis. Apart from 
the angular rotation of the beamforming, the general behavior is the same as 
before: the beamforming gain from the M antennas can be either utilized to 
achieve the same SNR as in the single-antenna case using M times less total 
transmit power (as in Figure 1.19(a)) or achieve M times higher SNR (as in 
Figure 1.19(b)) using the same total power. 

The beamforming gain is once again achieved by redistributing the transmit 
power between different angular directions. Figure 1.20 illustrates the received 
power level at different points on a sphere centered around the array. The 
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(a) M = 10 transmit antennas with a total transmit power of 0.1 W (0.01 W per antenna). 
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(b) M = 10 transmit antennas with a total transmit power of 1 W (0.1 W per antenna). 


Figure 1.19: The received signal power in different directions and at different distances, when 
transmitting time-shifted signals from M = 10 isotropic antennas located in the origin. The time 
shifts are selected to achieve constructive interference along the vertical axis. The color shows 
the received signal power in dBm when using (1.7) to compute the channel gain in free-space 
propagation, and f = 3 GHz is the signal frequency. 
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Figure 1.20: The normalized received power on different parts of a sphere centered around an 
array with M = 10 isotropic antennas, in the same setup as in Figure 1.19. The color shows the 
normalized received power in dB-scale where the maximum value is 0 dB. 


power is focused in one direction (the same pattern appears at the back 
of the sphere that is not visible). As M increases, the black dot where the 
received signal power is high will contain a larger and larger fraction of the 
transmit power but also become smaller. Although the pattern on the sphere 
differs from Figure 1.18, we can always obtain the original transmit power by 
integrating over the sphere, to sum up all the radiated power. 

How is this example related to spatial multiplexing? Suppose two users are 
located in sufficiently different spatial directions. There will be low interference 
if each user is located outside the other user’s main beam. Hence, these users 
can be served at the same time and frequency while achieving a decent SINR 
(much higher than that in the single-antenna case). Ideally, the data rate 
becomes proportional to the number of users. If K users are served by spatial 
multiplexing, then K times more data can be transmitted compared to the 
single-user case if the beamforming deals with the interference. A basic setup 
of spatial multiplexing is illustrated in Figure 1.21. 

The term “interference” has two different meanings in this context. The 
physical phenomenon of constructive/destructive interference determines how 
the signal copies emitted from multiple antennas superimpose over the air to 
form a directive beam. Moreover, when a signal reaches an unintended receiver, 
it is called interference for different reasons, and the interfering signal’s power 
is included in the denominator of the SINR. In the remainder of this book, 
we will only use the term in the latter sense. 
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Example 1.15. Two isotropic transmit antennas are deployed with a A/2 sep- 
aration and transmit to two single-antenna receivers, located as in Figure 1.21. 
The one-bit data intended for receiver k is represented by sk € {—1, 1}, for 
k = 1,2. It is multiplied by the sinusoidal pulse in (1.44) before transmission. 
Suppose both receivers need an SINR of 1 (i.e., 0dB) to reliably decode their 
data and that o?/8 = 107} W. Is it possible to select the transmit powers P 
and P» to enable reliable communication to both receivers simultaneously? 

Since the distance to receiver 1 is identical for the two transmit antennas, 
we can focus a beam towards this receiver by transmitting y Pı /2sıp(t) from 
both antennas, where the power P, is divided equally. To focus a beam on 
receiver 2, the two antennas can transmit \/P2/2s2p(t) and y P2/2s2p (t + a), 
where the delay is selected to compensate for the propagation delay difference 
of A. The received signal at receiver 1 then becomes 


nlt) = 24 Espl- n) + 280 ic 71) -of i = n)) A 


Noise 


Desired signal Interference from the second signal 


= /2P\B8sip(t — T1) + nı (t), (1.61) 


where 7, is propagation delay and n,(t) is the noise. The second equality 
follows from that p (t + 4 — 71) = —p(t — 71), as stated in (1.49). 
The received signal at receiver 2 becomes 


m =f Fi 00( vt T2,1) -of i = naa) ) 
ORe e a 


=2p(t—T2,1) 
Ti 
aF 5 Psi p(t — T21) + p(t — en + no(t), (1.62) 
ho ad 
=0 Noise 


where no(t) is the noise while 72, and 72,2 are the propagation delays from 
the first and second transmit antenna, respectively. The interference vanishes 
since TD = WA ar à and p(t = T2,2) =p (t = à = T2,1) = —p(t = T21). 
Since there is no interference and s? = sî = 1, the SINRs at receiver 1 
and receiver 2 respectively become 
o 20P,, SINR» = Be yk (1.63) 


o? a? 


For jointly reliable communication to the two receivers, we need SINR, > 1 
and SINR»: > 1, which is equivalent to 20P, > 1 and 20P2 > 1. We notice 
that P; and P, can be selected independently and that both conditions are 
satisfied if the powers are greater than or equal to 50 mW. 
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Figure 1.21: Schematic illustration of spatial multiplexing where two users are served at the 
same time and frequency, but their signals are transmitted using different beamforming. The 
true beam patterns are those shown in Figure 1.17 and Figure 1.19. 


The previous example substantiated the claim that we can communicate 
simultaneously with two receivers thanks to the use of multiple antennas. This 
was impossible in the single-antenna case analyzed in Example 1.14. In the 
considered geometrical setup, the beamforming towards the receivers simulta- 
neously maximizes their SINRs and SNRs since the receivers in Figure 1.21 
are located in ideal perpendicular directions. When considering other receiver 
locations, the beamforming that maximizes the SINR must balance achieving 
a high SNR and avoiding interference. This corresponds to not directing each 
main beam exactly onto its intended receiver but fine-tuning the beamforming 
to balance between high signal power and low interference. These factors are 
analyzed in detail later in this book. 

Adaptive beamforming from an antenna array is a much more flexible 
solution than using a single directive antenna. When serving a single user, 
adaptive beamforming can steer the emitted signal precisely toward the 
receiver, wherever it is. This is achieved electrically by time-delaying the signals 
emitted by the individual antennas. The same effect could be achieved by 
mechanically rotating a directive antenna. However, this is only an alternative 
in free-space propagation where there is only a single path and not in the more 
complicated non-line-of-sight propagation environments that often occur in 
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practice. Moreover, an antenna array can simultaneously spatially multiplex 
several users with different beamforming, while a directive antenna can only 
transmit with one directivity at a time. The spatial multiplexing feature was 
used in a few commercial networks in Southeast Asia in the early 2000s [26, 
Example 10.1]. It is nowadays a widely supported feature in WiFi 5 (802.11ac) 
[27] and 5G NR [5]. It will likely be a core feature also in future systems. 

Spatial multiplexing was conceived when cellular networks were used for 
voice communications. Hence, the network performance was characterized by 
the number of user connections that could be multiplexed and how good the 
network coverage was; the latter is the fraction of all spatial locations for which 
the SNR is above the threshold required for the voice quality to be acceptable. 
Both criteria could be improved by beamforming and spatial multiplexing of 
users. When wireless technology began to transmit data packets primarily, the 
data rate (bit/s) achieved by each user also became an important performance 
metric. The spatial multiplexing concept was then extended to setups where a 
single user device has multiple antennas [28]—[31], in which case one can assign 
multiple beams to the same device, and send several parallel layers of data to 
increase the data rate. The current wireless standards support a combination 
of these single-user and multi-user features [5], [27]: spatial multiplexing of 
many user devices and a few layers per device. 


1.2.3 Spatial Diversity 


In addition to increasing the SNR, using multiple antennas can improve 
the reliability of a wireless communication system. Thus far, we have mainly 
considered the free-space propagation scenario in Figure 1.1, in which there are 
no reflections or scattering: the only signal component that reaches the receiver 
is the one that has traveled along the direct path between the transmitter and 
receiver. This can be a good channel model for wireless communications in 
outer space but not for terrestrial systems where many reflecting/scattering 
objects might exist. This leads to so-called multipath propagation. 

To exemplify the basic impact of multipath propagation, suppose the 
transmitter emits a pure sinusoidal signal x(t) = sin(2aft), where f is the 
frequency. We consider the setup in Figure 1.22, where the signal reaches the 
receive antenna via two paths: the direct path has a distance dı, and the 
reflected path has a distance dz. Since electromagnetic waves travel at the 
speed of light c, the two distances correspond to the propagation time delays 

Tj = a i. for i = 1,2, (1.64) 
where À = c/f is the wavelength. For the sake of argument, we disregard that 
the two paths will have (slightly) different channel gains and omit the channel 
gain parameters in this example to simplify the notation. Disregarding the 
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Figure 1.22: A basic multipath channel with a direct and reflected path. 


additive noise, the received signal y(t) can be expressed as 
y(t) = a(t — T1) + z(t — 72) = sin (2r f(t — T1)) + sin (2r f(t — 72)), (1.65) 


where each term is called a multipath component. Depending on the relationship 
between the time delays 7,72, the two multipath components in (1.65) can 
either reinforce or cancel each other. Using a trigonometric identity!® we can 
rewrite (1.65) as 


y(t) = 2c08 t. Ate 7) sin (2x f (« = 13*)), (1.66) 


Delayed version of the signal 


where 2 cos (m f (T1 — T2)) is the constant amplitude of the received signal and 
sin(27 f(t — £™)) = z(t — 2 £%) is a version of the transmitted signal with 
the average time delay. The constant amplitude can be rewritten as 


2cos (mf (71 — T2)) = 2 cos (as (£ — =) = 2cos (=$) (1.67) 
by utilizing (1.64). We notice that this amplitude has a sign and can take any 
value between —2 and +2 depending on the argument of the cosine function. 
By comparing this amplitude with the unit amplitude achieved with only the 
direct path, we notice that multipath propagation can be either a blessing or 
a curse. In particular, if (dı — d2)/X is an integer, then (1.67) becomes 2, 
and we benefit from having two paths by getting twice the amplitude. This 
happens because the signals received over the two paths have identical phases. 
On the other hand, (1.67) is zero when dı and də differ by A/2 (+ any integer 
number of wavelengths), because then the signals received over the two paths 
have opposite phases and their multipath components cancel out. When this 
happens (exactly or approximately), the channel is said to be in a deep fade. 


This phenomenon is illustrated in Figure 1.23, where the signless amplitude 
dı —də 
X 


2 [cos G )| is shown. The key message is that a small change in the 


distance difference dı — dy can make the amplitude of the received signal either 


10We use the fact that sin(#) + sin(#) = 2 cos (5#) sin (82). 


1.2. Three Main Benefits of Having Multiple Antennas 45 


Amplitude 


-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 
Path difference: (dı — d2)/A 


Figure 1.23: In the multipath example in Figure 1.22, the amplitude 2 [cos (r452) | of the 
received signal varies rapidly as the difference in path distances changes. 


grow or fade away. The interpretation of “small” is that the change happens 
when the transmitter and/or receiver move a fraction of the wavelength; for 
example, \/4 is the change in the path difference dı — dz that is needed to 
move from the peak amplitude 2 to V2 in Figure 1.23, which corresponds 
to losing half the signal power (i.e., the channel gain reduces from 2? = 4 to 


VÝ = 2). That distance is 2.5cm if f = 3GHz and 2.5mm if f = 30 GHz. 
These rapid channel changes are called multipath fading or small-scale fading. 

The described two-path scenario resembles the behavior that appeared 
when transmitting from two different antennas in free-space propagation: the 
emitted signals are received along two paths with different time delays. The 
core difference is that with multiple transmit antennas, we can compensate 
for the time delays at the transmitter side (this is what we call adaptive 
beamforming). This is impossible in single-antenna multipath propagation 
since the two signal copies originate from the same transmit antenna. 

Since a slight movement of the transmitting and receiving devices can lead 
to huge SNR fluctuations, multipath fading is a problematic phenomenon that 
makes wireless communications fundamentally unreliable. When transmitting a 
data packet, we must select a particular digital modulation and channel coding 
scheme in advance. Based on this selection, the receiver needs a particular SNR 
level during the transmission to decode the packet correctly. This level cannot 
always be fulfilled when the SNR fluctuates after the modulation/coding has 
been selected. When a packet cannot be decoded due to the channel being 
in a deep fade, we say an outage has occurred. Interestingly, multiple receive 
antennas can be used to protect communication against outages. Pioneering 
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research on this topic was performed already in the 1930s by [32], [33], and 
the mathematical analysis presented in this book dates back to the 1950s [34]. 
Multiple transmit antennas can also protect against fading, but this requires 
more complex techniques, first developed in the 1990s [35]—[37]. In this section, 
we will introduce the basic concepts and then return to the topic in Chapter 5. 

Suppose a random variable models the channel between the transmit and 
receive antennas: the communication works flawlessly with probability 1 — p, 
while it breaks down entirely with probability p. Hence, an outage occurs with 
probability p. This means that whenever you want to transmit a data packet, 
the outage probability is p. 

If we instead make use of two receive antennas and the channel to each 
one of them is described by an independent random variable with the same 
distribution as above, three random events can occur: 


1. Both antennas have good channels, happening with probability (1 — p)?; 


2. One antenna has a good channel, and the other antenna experiences an 
outage, which happens with probability (1 — p)p + p(1 — p) = 2(1 — p)p; 


3. Both antennas experience outages, which happens with probability p°. 


It is only in the third case that the receiver cannot decode the data packet. 
Hence, the outage probability is p? in this two-antenna setup. 

By following the same logic, if we have M receive antennas and each one 
experiences an outage with probability p, then the probability that all the 
antennas are simultaneously experiencing outages is p” (assuming that the 
outage events occur independently for every antenna). This means that the 
reliability of the communication system rapidly improves as we add more 
receive antennas, known as spatial diversity. The name suggests that, at every 
time instance, we utilize the spatial domain to combat fading; for example, by 
only using those antennas that are located at spatial locations that currently 
experience good channel conditions. The argument above applies when using 
any antenna type. Directive antennas can be used to improve the SNR, but 
they cannot be used to obtain spatial diversity; multiple antennas are needed 
for that.'! However, the antenna array can be actively designed to extract as 
much diversity as possible in a given propagation environment. This can be 
achieved by deploying the antennas far apart, rotating their antenna gains 
differently, and making them sensitive to waves with different polarization; the 
overarching goal with this is to ensure that the antennas experience outage 
events nearly independently so that the maximum diversity can be achieved. 
The term antenna diversity is sometimes used to describe how spatial diversity 
and antenna design are utilized jointly to achieve reliable communications. 


11Directive antennas might reduce the impact of multipath propagation, compared to isotropic 
antennas since some multipath components can “disappear” because there are low antenna gains 
in their directions. However, active exploitation of spatial diversity requires multiple antennas. 
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Figure 1.24: The outage probability p™ as a function of the number of antennas for different 
values of p, which is the probability that an arbitrary antenna observes an outage. 


The benefit of spatial diversity is illustrated in Figure 1.24 for p = 0.5 and 
p = 0.1. The vertical axis shows the outage probability using a logarithmic 
scale, while the horizontal axis shows the number of antennas on a linear scale. 
The figure demonstrates that the outage probability reduces rapidly when the 
number of antennas increases. The slope of the curve becomes steeper when 
p is smaller since the outage probability is 10log; (p™”) = 10M log,,(p) dB. 
Hence, a single-antenna system that has a noticeable outage probability can 
be greatly improved by adding additional antennas. We can never achieve 
a zero-valued outage probability when p > 0, but suppose 107? = 0.001 is 
acceptable in a practical system. The figure shows that it can be achieved using 
3 antennas if p = 0.1 and 10 antennas if p = 0.5. The latter represents a very 
unreliable channel, but it can be turned into a very reliable communication 
system by using the spatial diversity provided by having many antennas. 

Spatial diversity can also be utilized in the opposite scenario where the 
transmitter has multiple antennas while the receiver has a single antenna. We 
then must be mindful of both outage events for the channels between each 
transmit antenna and the receiver and the risk that the signals emitted from 
the antennas cancel over the air. A simple way to alleviate the latter issue is to 
transmit from the antennas at different times or frequencies and then let the 
receiver jointly process the received signals to retrieve the information without 
any outage. This is inefficient since the same signal must be repeated several 
times before the next signal can be transmitted. There are more efficient 
solutions called space-time codes where multiple signals are repeated at the 
same time in an intricate way that enables the receiver to exploit diversity. 
We will describe these methods in Section 5.3. 
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Example 1.16. There might be a correlation between the channel conditions 
experienced at the different antennas of an array. In this example, we consider 
a single-antenna transmitter and a receiver with an array of M antennas. We 
let B,, denote an outage event at receive antenna m, and it occurs with the 
marginal probability Pr{ Bm} = p, for m = 1,..., M, as previously in this 
section. The outage probability of the channel is the joint probability that 
all antennas are experiencing an outage simultaneously: Pr{B),..., Bas}. It 
equals the product p™ of the marginal probabilities when the outage events 
are independent between the antennas but not if the events are correlated. 

We assume that if one antenna experiences an outage, the conditional 
probability for the other antennas changes to o € [0,1]. Hence, Pr{ B1} = p 
but Pr{B2|B,} = o, which can be larger or smaller than p depending on the 
value of 9. The typical situation in practice is that 9 > p so that an outage at 
antenna 1 increases the probability of an outage at the other antennas. What 
is the outage probability Pr{B,..., Bm} of this channel? 

Based on the assumed correlation model, an outage event at antenna m, 
given the information that the antennas 1,...,m— 1 experience outage, is 


Pry By, ig ee at = 0. (1.68) 
We can then use the chain rule® for random events to compute 


Pr{ B1, e ,Bu} = Pry, By} Pri B2, ee) Bmu|Bi} 
= Pr{ B: }Pr{ B2|B1}Pr{B3, 0-0 »Bu|Bi, Bo} 
M 
See Pr al Pre eee Be — pe C) 


mMm=2 


If o = 1, so that an outage event at one antenna guarantees outages on all other 
antennas, we get Pr{B1,..., Bm} = p. There is no spatial diversity benefit 
from having multiple antennas in this extreme case, but having the extra 
antennas does not hurt. However, whenever ọ < 1, the outage probability will 
decay as o¥ 7t when increasing the number of antennas. On a decibel scale, the 
outage probability behaves as 10 log,,(pe”—1) = 10M log,,(@)+10log,,(p/e), 
similar to the case with independent outage events. If we would add a new 
curve to Figure 1.24 that represents this new scenario with correlated outages, 
it will decay similarly to the existing curves, but the slope depends on the 
correlation o rather than the marginal probability p. The key conclusion is 
that the spatial diversity brought by having multiple antennas helps lower the 
outage probability compared to the single-antenna case, even if the outage 
events are correlated between the antennas. 


“If A and B are two random events, then the chain rule says that Pr{ A, B} = Pr{ A}Pr{B|A}. 
The rule can be expanded by including more than two events and can then be applied repeatedly, 
as done in (1.69). 
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1.3 Exercises 


Exercise 1.1. Since the power levels in wireless communications can be extremely 
different, it is convenient to use decibel scales. 


(a) What is 1mW expressed in dBm? 
(b) What is 30dBm expressed in Watt? 


(c) Suppose we transmit a signal with power Pix = 20 dBm and that 90dB is lost 
on the way to the receiver. What is the received signal power P,x? Express the 
answer in both dBm and mW. 


(d) Suppose the noise power is No B = —100 dBm. What is the SNR P,x/(NoB) at 
the receiver? What is the unit of the SNR? 


Exercise 1.2. The SNR determines how much data can be transmitted per modulation 
symbol in a wireless communication system. The system is not operational if the SNR is 
below a specific value, in which case we are out-of-coverage. In this exercise, we consider 
a system that is operational when the SNR is equal to or larger than —10dB. 


(a) A single-antenna base station communicates with a single-antenna user device. 
The base station transmits with 10 W and the device with 0.1 W. The channel 
gain is —110 dB, the bandwidth is 10 MHz, and the noise power spectral density 
is 10717 W/Hz. Compute the SNRs achieved in the uplink and downlink. 


(b) The computation in (a) reveals that the uplink SNR is below —10dB. Hence, the 
system is not operational, even if the downlink SNR is above —10dB. This is a 
common issue that can be resolved using multiple antennas at the base station. 
How many antennas are needed in this case, if the SNR is proportional to the 
number of antennas? 


(c) Can we instead change how much bandwidth that is used? If yes, explain how and 
what the consequences will be. If no, explain why. 


Exercise 1.3. The parametric channel gain model in (1.9) is entirely determined by the 
channel gain values at two different distances. Suppose the channel gain is —100 dB at 
d = 100m and —135dB at d= 1000 m. 


(a) What are the values of the pathloss exponent a and the constant T? Assume that 
the measurements were made using isotropic antennas. 


(b) Suppose the measurements were made using short dipoles with antenna gains 
of 1.5 at the transmitter and the receiver. What are the values of the pathloss 
exponent a and the constant Y? 


Exercise 1.4. Consider a (hypothetical) antenna with the gain function 


ccos(4y + 7) cos(6), if y € [—37/8, —7/8], 0 € [-r/2, 7/2], 
G(p,0) = $ ccos(3y — r) cos(0), if y € [7 /6, 7/2], 0 € [—7/2, 7/2], (1.70) 
0, elsewhere, 


where c > 0 is a constant. 


(a) If the antenna is lossless, what should be the value of c? 


(b) What is the maximum effective area of this antenna, and for which angles (y, 0) 
is it achieved? 
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Exercise 1.5. A single-antenna transmitter located at the point with Cartesian coordi- 
nates (300, 400, 0) m communicates in free space with a single-antenna receiver located 
in the origin. The transmit power is 20 dBm, the carrier frequency is f = 3 GHz, and 
the bandwidth is B = 10 MHz. Due to noise amplification in the receiver hardware, the 
noise power spectral density is NoF', where No is given in (1.11) and F = 4dB is called 
the noise figure. 


(a) What is the SNR if isotropic antennas are used? 


(b) What is the SNR if antennas with the cosine gain function in (1.34) are used? 
The transmit antenna achieves its maximum gain in the azimuth plane in the 
negative y-axis direction, while the receive antenna achieves its maximum gain in 
the positive y-axis direction. This setup is shown to the left in the figure below. 


(c) If the transmit antenna in (b) is rotated clockwise by 7/2 radians in the azimuth 
plane, what is the SNR? This setup is shown to the right in the figure below. Note 
that the antenna gain pattern in (1.34) should be rotated accordingly. 


y A Transmitter vÀ Transmitter 
400— pa 400+ 
A A 

Receiver 300 «x Receiver 300 zx 


Exercise 1.6. Consider an isotropic transmit antenna and a flat receive antenna having 
the width a. For simplicity, the antennas are located in the same two-dimensional plane, 
and the geometry is similar to Figure 1.8 but rotated. The transmitter is located at the 
origin (0,0). The receive antenna covers the line segment from (v3d/2, d/2 — a/2) to 
(/3d/2, d/2 + a/2), where d is the propagation distance to the center (V3d/2,d/2) of 
the receive antenna. 


(a) Suppose d > a and the transmitted signal has wavelength A. What are the 
approximate phase differences between the signal received at the center and the 
signals received at the two corners? 


(b) Suppose d = ae which is the Fraunhofer distance defined in (1.18). What is the 
maximum phase difference between two points on the receive antenna? Is this 
value in line with the definition of the Fraunhofer distance? 


Exercise 1.7. Consider a transmitter with an array of M = 3 isotropic antennas located 
at the Cartesian coordinates (A/4,0,—A/2), (0, —A/3,0), and (0,0,A/2), where A is the 
wavelength. The transmitted signal from the mth antenna is £m(t) = Ap(t + Tm) where 
p(t) is the sine pulse in (1.44). We want to maximize the received signal power at the 
spherical coordinate (d, y,@), where d >> A. What values of the delays 71, T2, and 73 can 
be selected? Is the solution unique? 
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Exercise 1.8. A transmitter equipped with M = 2 antennas communicates with a 
single-antenna receiver. The propagation time delays from the first and second antennas 
are denoted by 71 and 72, respectively. The transmitter compensates for these delays 
by transmitting the signal £m(t) = Amp(t+ Tm) from the mth antenna, for m = 1,2, 
where Am > 0 is the amplitude and p(t) is the sine pulse in (1.44). The total transmit 
power is A? + AZ. 


(a) Suppose the channel gains to the receiver are the same for both antennas and 
denoted by £. If the total transmit power must be equal to P, what values of A, 
and Az maximize the received signal power? 


(b) Suppose the channel gains from the first and second antenna are denoted by (1 
and (2, respectively. If the total transmit power must be equal to P, what values 
of A; and Az should be selected to maximize the received signal power? 


(c) If 61 > 82, which antenna will transmit with the highest power according to the 
answer in (b)? 


Exercise 1.9. Consider a transmitter with two isotropic antennas that emit the same 
signal Ap(t), where A is the amplitude and p(t) is the sine pulse in (1.44). The antennas 
are located at the Cartesian coordinates (0,y0,0) and (0,—yo,0), for some value of 
yo > 0. We are interested in receiver locations with spherical coordinates (d, y, 0) that 
are at a large but fixed distance d >> yo from the transmitter but have varying azimuth 
angle y. The channel gain 8 is the same from both antennas to any of these points. 


(a) What is the minimum value of yo for which destructive interference occurs at 
(d, y,0) for at least one y E [—7, 7)? 

(b) For what range of yo values will constructive interference occur at (d,y,0) for six 
different values of y € [—7, 7)? 


Exercise 1.10. Consider a transmitter array with two isotropic antennas having the 
Cartesian coordinates (0, å/4,0) and (0, —A/4,0), respectively. These antennas jointly 
transmit signals to two receivers located at the spherical coordinates (d,0,0) and 
(d, 7/3,0), respectively. The distance d is large, so the channel gain 6 is the same 
between any transmit antenna and receive antenna. The time-limited pulse in (1.44) 
is used to carry the two symbols s1,s2 € {—1,1} intended for the two receivers. To 
beamform towards the first receiver, both transmit antennas send VP181 p(t) using some 
power P,. Moreover, to beamform towards the second receiver, the two antennas transmit 
v Pzs2p(t) and yPzs2p(t + ae). respectively, using some power P2. Suppose that 
a? /B = 107'W. Is it possible to select the powers P, and Pz so that both receivers 
achieve an SINR of 10 dB? If yes, give an example of how it can be done. 


Exercise 1.11. Communication systems that operate in the mmWave and sub-THz bands 
are sensitive to signal blockage by the human body, which might lower the received 
power by more than 20 dB. To circumvent this issue, a handheld device can be built with 
antennas at different sides (e.g., at the top and on the right side) to make it unlikely 
that they are all blocked simultaneously by the user. 


(a) Consider a device with two antennas. The outage probability of one antenna is p. 
However, if one antenna is in outage, the outage probability of the other antenna 
reduces to @ < p thanks to the antenna placement. What is the probability that 
both antennas are in outage simultaneously? 


(b) How much larger is the outage probability with independent outage events com- 
pared to the probability in (a)? 


Chapter 2 


Theoretical Foundations 


This book is dedicated to analyzing multiple antenna communication systems, 
and we will rely on methods from linear algebra, probability theory, signal 
processing, and information theory. This chapter will describe the key results 
from these fields that we will utilize in later chapters, using the notation and 
terminology used in the remainder of this book. The reader is expected to 
be familiar with these general topics since the chapter mainly summarizes 
essential results, and we refer to other textbooks for an in-depth introduction. 
The focus is on complex numbers and how they enter into the aforementioned 
theory when developing concise models of communication systems. 


2.1 Complex Numbers and Algebra 


Complex numbers naturally appear when analyzing communication systems, 
for example, since the frequency representation of signals and systems is 
generally complex. The fundamental component of complex numbers is the 
imaginary unit, which we denote j = \/—1. Note that the letter “j” is used in 
electrical engineering instead of the letter “i” commonly used in the mathe- 
matical literature to not confuse it with the letter used for electrical currents. 

We let C denote the set of all complex numbers. Any complex number 
c € C can be decomposed as 

c=a+jb (2.1) 
for some real numbers a,b € R. In this case, a is the real part of c, while b is 
the imaginary part of c. We will let R(-) be the function that outputs the real 
part of its input, while S(-) is the function that outputs the imaginary part. 
Hence, if c= a + jb, it follows that R(c) = a and S(c) = b. 

The representation in (2.1) is called the Cartesian form. Instead of de- 
composing a complex number c in its real and imaginary part, we can use 
the polar form to decompose it using the magnitude and argument. More 
precisely, 

c= [eja 80), (2.2) 
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where |c| = Va? + b? > 0 is the magnitude (also known as absolute value), 
describing the length of the vector [a, b|", and the argument arg(c) € [—7, 7) 
is the angle of that vector in R?. The polar form contains Euler’s number 
e & 2.71828 and makes use of the complex exponential function 


e” = cos(x) +jsin(x), (2.3) 


where the real and imaginary parts contain the cosine and sine of the argument 
x € R, respectively. This relation is known as Euler’s formula. 

The different ways to represent a complex number are illustrated in Fig- 
ure 2.1. From the definition of the sine and cosine functions, we can also 
establish the relation 


c = |c| cos (arg(c)) +j |c| sin (arg(c)) (2.4) 
=a = 


between the Cartesian and polar forms. Hence, when considering signals, |c| 
can represent the magnitude/amplitude while arg(c) can represent the phase. 

The complex conjugate is a vital operation when considering complex 
numbers. The complex conjugate of c = a + jb is denoted as c* and computed 
by switching the sign of the imaginary part: c* = a — jb. This is equivalent to 
switching the sign of the argument: c* = |eļei arsle), Note that 


c = (a + jb) (a — jb) = a? + jab — jab — j?b? = a? + b? = ||”. (2.5) 


Hence, we can compute the squared magnitude of a complex number by 
multiplying it with its complex conjugate. We can also extract the real and 
imaginary parts by adding c and c* with different scaling factors: 


J“ + c*)= 5 (a + jb+a—jb)=a=R(o), (2.6) 
al č) = mG + jb-—a+jb) = b = Sc). (2.7) 


The complex exponential function is the essential building block to create 
sinusoids oscillating at a specified frequency fe. If x is replaced by 2r fet in 
(2.3), we obtain the complex exponential e}?7fet = cos(2r fet) +jsin(27 fet), 
where t represents time. By following the procedure in (2.6) and (2.7), we can 
extract the real and imaginary parts as 


La 1 _, 

cos(27 fet) =R (enn = aka + E (2.8) 
, la l 

sin(27 fet) = S (erst) = oe = oe (2.9) 


The unique aspect of the complex exponential is that it only contains the 
frequency fe, and no other frequencies. Since the cosine and sine functions 


54 Theoretical Foundations 


Imaginary part 


Real part 


R(c) 


Figure 2.1: The complex number c can be equivalently represented by two real-valued numbers. 
The Cartesian form is c = R(c) + jS(c), with the real part R(c) and imaginary part S(c). The 
polar form is c = |clei?"8(©), where |c| is the magnitude and arg(c) is the argument. 


are created as linear combinations of both e}? fst and e—?7fct, these signals 
are said to contain both the positive frequency fe and the negative frequency 
— fe. Any real-valued signal contains a range of positive frequencies and 
the corresponding negative ones. We will continue to study the frequency 
representation of signals in Sections 2.3 and 2.8. 


Example 2.1. Let c = a + jb be an arbitrary complex number. Show that the 
sinusoid acos(t) + bsin(t) with the time variable t can be written as a single 
cosine function, using the polar form c = |eļe aslo). 

The sinusoid can be rewritten as 


acos(t) + bsin(t) = R ((cos(t) +j sin(t)) (a — jb)) 
= R (e*c*) = R (elele a = je R o, 
= |c| cos (t —arg(c)). (2.10) 


This shows how the amplitude and phase of a sinusoid can be represented by a 
complex number, which is a primary reason for using them in communications. 


2.1.1 Vector Analysis 


Vectors and matrices are commonly used when describing systems with multi- 
ple antennas, where each entry is related to one of the antennas. The entries 
will be complex in most of the chapters of this book. Thus, we will briefly 
review the foundational linear algebra results in the complex domain. 
An M-dimensional vector containing the complex entries £z1,..., £m € C 
can be expressed as 
Tı 
x=]: |. (2.11) 


TM 
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Second entry 


First entry 


Figure 2.2: The complex vector x has a length determined by the norm ||x||. The unit-length 
vector x/||x|] points in the same direction as x. 


We denote vectors using lower-case bold-faced letters, such as x. The entries 
are expressed using the same letter and a subscript indicating the entry 
number, such as £m for the mth entry of x. Since the vector belongs to the 
M-dimensional complex vector space C™, we can write that x € C™. 

A vector x has a norm that describes the distance between the origin and 
the point x in the vector space. Since it describes the length, it can be viewed 
as the generalization of the magnitude to vectors. The Euclidean norm is 
denoted by ||x|| and is computed as 


Ix] = yle? +... + lel? = (2.12) 
By using the norm, we can decompose the vector as 
x= xl- (2.13) 


x 
[xl 
SY 
Length Direction 


where the second term is the length-one vector pointing in the same direction 
as x. Figure 2.2 illustrates how an arbitrary vector x is described by its 
length/norm ||x|| and the direction x/||x||. There will be occasions in this 
book where we want to select two vectors that point in the same direction 
but have different norms, in which case we can utilize this decomposition. 
All the vectors in this book are column matrices, meaning they have 
one column and multiple rows. When dealing with matrices, one can switch 
the meaning of rows and columns using the operation called transpose. The 
transpose of an arbitrary vector x is denoted as x”. For example, the transpose 
of (2.11) is 
xT = [v1 ae £m] , (2.14) 


which is a row matrix containing the same entries. 
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When dealing with complex vectors, there is another type of transpose 
that also includes the complex conjugate operation: 


ee |i e. iy (2.15) 


This will be called the conjugate transpose in this book but is also known 
as the Hermitian transpose, which explains the letter *. A third operation 
that we will use is the complex conjugation of a vector (or matrix), which is 
defined as taking the complex conjugate of the individual entries: 


vy 
i oe Il (2.16) 

tM 
The conjugate transpose is simply a combination of the conventional transpose 
and the conjugation, x" = (x™)*, but it is so commonly occurring in complex 


vector analysis that it deserves its own notation. 
The inner product (or dot product) between two M-dimensional complex 


vectors x and y = [y1,..., ym" is defined using the conjugate transpose as 
M 
oS 5 En Ym: (2.17) 
m=1 


The magnitude |x"y| of the inner product becomes larger the more similar 
the directions of the two vectors are and smaller when the directions are very 
different. This statement can be quantified by the Cauchy-Schwarz inequality, 
which states that 

Ix"y] < [xllIlyll (2.18) 


with equality if and only if x and y are parallel (i.e., x = cy for some non-zero 
c € C). The upper bound is the product of the lengths of the two vectors. 
Figure 2.3 illustrates how the inner product varies depending on the directions 
of the vectors, with the parallel vectors x,y, achieving the upper bound in 
the Cauchy-Schwarz inequality and orthogonal vectors x, y3 having an inner 
product equal to zero. The latter vectors span a two-dimensional plane in the 
M-dimensional vector space and are separated by 90° in that plane. 


Example 2.2. Suppose we are given a vector x € C™ and can select the 
vector y € C™ freely. Which selections will maximize or minimize Powe 

The minimum is 0 and achieved for any vector y that is orthogonal to x. 
The Cauchy-Schwarz inequality implies that the maximum is obtained for 
y = cx for any non-zero c € C. 


When one of the vectors has a unit length, the inner product can also be 
interpreted as an orthogonal projection onto that vector. Suppose x has unit 
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(Orthogonal to x) 
y3 


(Parallel to x) 


Figure 2.3: The magnitude of the inner product |x"y,;| between two vectors depends on how 
similar their directions are. Parallel vectors give the largest value and achieve the upper bound in 
the Cauchy-Schwarz inequality in (2.18): |x" y1| = ||x||||yi||. Orthogonal vectors give x"y3 = 0, 
while other vectors give a number in between zero and the upper bound. 


x (with length ||x|| = 1) 
Yproj.x = (x"y)x 


Figure 2.4: If x is a unit-length vector, the inner product x"y is tightly connected to the 
orthogonal projection of y onto x. The orthogonal projection is yproj,x = (x"y)x and has the 
length |x"y|. 


length (i.e., ||x|| = 1) and let y be any other vector of the same dimension. 
The magnitude |x"y| of their inner product is also the length of the vector 


Yproj.x = (X"y)x, (2.19) 
which is the orthogonal projection of y onto the direction pointed out by x. 
This projection is illustrated in Figure 2.4. From this example, we can notice 
that only the part of y that is parallel to x will affect the inner product; thus, 
there are many different vectors y that have the same inner product with x. 
It can also be proved that Yproj,x is orthogonal to y — Yproj,x- 
A special case where the upper bound is achieved is when the inner product 
is computed between x and itself: 


M M 
x"x = 5 Li Lm = 5 [Em| = Ixl’, (2.20) 
m=l m=l1 


where the last equality follows from (2.12). Hence, the squared norm of a 
vector x can be computed using the inner product. This is a generalization 
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of (2.5), where we computed the squared magnitude of a complex number by 
multiplying it with its complex conjugate. 

By utilizing (2.20), the squared norm of the summation of two arbitrary 
vectors x and y (of the same dimension) can be expanded as 


IIx + yl]? = x"x + y"y + x"y + y"x 
= lix]? + lly]? + 2R(x"y) (2.21) 


by utilizing the fact that x"y and y"™x have the same real part but imaginary 
parts with opposite signs. 


Example 2.3. Consider a set of K unit-length vectors x;,...,x« E€ C™ that 
are mutually orthogonal, where K < M. Compute the squared norm of the 
vector y = D CkXk, Where c1,...,CK E€ C are scalar coefficients. 

From the provided information, we have ||x+|| = 1, for k = 1,..., K, and 
X} Xm = 0, for k #4 m. We use these properties to expand the squared norm as 


K K K 
lyll? =y"y= N ChXk D Cee = X |crl’ XkXk + Ss > Chm aa 
k=1 m=1 k=1 ~~ SS 
=||xx||?=1 m#k = 
K 
=> ele (2.22) 


k=l 


We notice that |ly||? is the summation of the squared coefficients, which 
determine the length of y in each of the K orthogonal directions x1,..., Xg. 


The summation of vectors, multiplied by scalar coefficients, is known as a 
linear combination. If x1,...,xx € C™ are K vectors and cj,...,cK € C are 
K scalar coefficients, then the linear combination of the vectors using those 


coefficients is 
K 


C1X1ı + C2X2 + ... + CKXK = 5 CkXĘk. (2.23) 
k=1 
This concept is helpful in making geometrical comparisons of vectors in 
high-dimensional situations where we cannot draw them on paper. 


Definition 2.1. The vectors x), X2,..., Xg are said to be linearly independent 
if the system of equations 


CyX, + &X2 +... + CKXg =0 (2.24) 


only has the solution cı = ... = cx = 0. If additional non-zero solutions exist, 
the vectors are said to be linearly dependent. 
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Any two vectors are linearly independent except if they are entirely parallel; 
thus, linear independence is a broader condition than orthogonality. For 
example, we can pick any two of y1, ye, and y3 in Figure 2.3 and get a 
set of linearly independent vectors. However, the set of all three vectors in 
the figure is linearly dependent because y2 points partially in the direction 
of yı and partially in the direction of y3. This is a typical situation when 
considering two-dimensional vectors, as in the figure: if we pick more than two 
vectors, they must always be linearly dependent because they share the same 
two dimensions. More generally, any set of more than M vectors that are 
M-dimensional must be linearly dependent, but we can find a set with exactly 
M linearly independent vectors. Moreover, any set of pairwise orthogonal 
vectors can be shown to be linearly independent. 


Example 2.4. Consider the vector y = Da CkXķk, constructed using the mu- 
tually orthogonal unit-length vectors x1,...,xm € C™ and scalar coefficients 
c1,- .., CM E C. Let yproj,x,, denote the orthogonal projection of y onto Xm, 
which is the mth of the unit-length vectors. . What are the squared norms of 
Yproj,xm and the residual vector y — Yproj,xm? 

The vector yproj,x,, is computed similarly to (2.19) as 


M 
Yproj,xm = (xmy) Xm = (> Ck pi a j = CmXm- (2.25) 
p= “——" 
_J1l, m=k 
= mézk 


Hence, we obtain ||yproj.x ||? = |em|?, which is the squared coefficient associ- 
ated with xm. The squared norm of the residual y — yproj,x,, becomes 


2 


M M 
=| eed) = Sle C 
k=l k=1 


kým kEm 


2 


2 
[Y — Yprojxmll = 


M 
` CkXk — CmXm 
k=l 


which is the sum of all the other squared coefficients. 
We notice that ||Yprojxn ||? + IY — Yproj.xm || = lly||?, which is a conse- 
quence of the fact that yproj,x,,, is orthogonal to y — Yproj,xm 


An orthonormal basis in C™ is a set of M vectors bı,..., bpm that satisfies 
the following two conditions: 


1. The vectors are mutually orthogonal, so that their inner products are 
bib; = 0 for any choice of i, j € {1,..., M} such that i Æ j; 


2. The vectors have length one so that their norm is ||b;|| = 1 for all 
i€ {1,..., M}. 
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There are many examples of orthonormal bases. One way of constructing 
it is to let b; be 1 in entry i and zeros elsewhere. For M = 4, this results in 


1 0 0 0 
0 1 0 0 

bı = ol? bo = ol’ b3 z 11? b4 = 0 (2.27) 
0 0 0 1 


A common reason for defining an orthonormal basis is that any other vector 
x € C™ can be written as a linear combination of the M basis vectors: 
X= pt cib; for some coefficients c,,..., Caz. This follows from the fact that 
any set of M + 1 vectors is linearly dependent in C™. 


2.1.2 Matrix Analysis 


A vector is a special case of a matrix. An M x K matrix has M rows and 
K columns, and contains MK entries. Let hm, € C denote the entry at the 
mth row in the kth column. The full matrix can then be expressed as 


hia er hi K 
H=|: «2: |. (2.28) 
hu aed hM,K 


We denote matrices using upper-case bold-faced letters, such as H. The space 
of all complex matrices of size M x K is denoted as C¥ XK; thus, we can write 
that H c C“’**. The transpose and conjugate transpose are computed as 


hia -hma hii © M,1 
hin... RhM,K LK «+ BMK 


respectively. Note that H™ is obtained by flipping the matrix over its diagonal, 
while H” is obtained by both flipping the matrix and replacing each entry by 
its complex conjugate. Both operations change the dimensions of the matrix: 
H” and H” belong to the space CX*™ with all complex K x M matrices. 
Only in the square matrix case of M = K is the dimensionality unchanged. 

The columns of a matrix are important when analyzing its properties. Let 
hy,..., hx denote the K columns of an M x K matrix H. We notice that 
each column is an M-dimensional vector. The matrix-vector product between 
the matrix H and a k-dimensional vector c = [c1,...,cK]* is denoted as He 
and is an M-dimensional vector computed as 


MAG +... + hi, KCK 
Hc = : = ci hy + ... + cghg. (2.30) 


hmc Kas hu, KCK 
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This is the linear combination of the column vectors of H using the corre- 
sponding entries of c as coefficients. Hence, the directions of the columns will 
determine which directions the vector He can have. In particular, we can 
never get a vector that is orthogonal to all the columns of H. 

A square matrix where all the off-diagonal entries are zero is called a 
diagonal matrix. If the diagonal of an M x M diagonal matrix D contains 


the entries d,,...,daz, then the matrix is 
d 0 ... 0 
D= i ds (2.31) 
: ; - 0 
O° sae 0 dm 
and will be written in short form as D = diag(d4,..., dm). 


A diagonal matrix with only ones on the diagonal is known as an identity 
matrix. We will denote the M x M identity matrix as Im. The columns of an 
identity matrix are an orthonormal basis in C™, as exemplified in (2.27). 

Non-diagonal square matrices can be transformed into diagonal matrices by 
a process known as diagonalization. We will summarize this process because 
it reveals several key properties of matrices, starting with the eigenvalues. 


Definition 2.2. Consider an M x M matrix A and a non-zero vector u € C™. 
If 
Au = Au (232) 


for some scalar A € C, then u is an eigenvector of A with A being the 
corresponding eigenvalue. 


The output of the matrix-vector product Au is generally a rotated and 
stretched version of u. The unique property of an eigenvector u is that it is 
only stretched by the scalar factor À (the eigenvalue). Two different matrices 
generally have different eigenvectors and eigenvalues. 

Each M-dimensional matrix has M eigenvalues, which can be denoted 
as à1,..., Am. There are two matrix operations that directly expose the 
eigenvalues. The first operation is the trace tr(A) that is defined as the sum 
of the diagonal entries of A, but also has the property 


tr(A) = 5° Am: (2.33) 


The second operation is the determinant det(A), which has a complicated 
definition and can be computed in multiple ways but satisfies the property 
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Hence, the trace and determinant are the sum and product of the eigenvalues, 
respectively. The determinant is zero whenever one of the eigenvalues is zero. 
The eigenvalue definition Au = Xu in (2.32) is equivalent to (A — AI m )u = 
0, which means that A — AIm must have a zero-valued eigenvalue. Hence, 
det(A — AIm) = 0 and we can use this equation to identify the eigenvalue A. 
More generally, the characteristic polynomial of a matrix A is expressed as 


det(A — AI) = (Ar = A)(A2 — A) (Am = A), (2.35) 


where is the variable and the determinant plays an essential role. All the 
M eigenvalues are roots of the characteristic polynomial and vice versa. The 
same eigenvalue can appear multiple times in the characteristic polynomial. 


Example 2.5. What are the eigenvalues of the 2 x 2 matrix 


C=) 
sof, oe (2.36) 
The characteristic polynomial of this matrix is 
ABH. is ae (es oe A) i i 
= ah SDS (A+ A= 2), (2.37) 


where we utilized the property that the determinant of a 2 x 2 is the product 
of the diagonal entries minus the product of the off-diagonal entries. The 
roots to the characteristic polynomial are A; = —1 and A2 = 2, which are also 
the eigenvalues of A. 


The rank of an M x K matrix equals the maximum number of linearly 
independent columns the matrix has. The rank is also equal to the maximum 
number of linearly independent rows. The rank can take any value between 0 
and min(M, K); that is, the minimum of M and K. In the case of an M x M 
square matrix, the rank is greater than or equal to the number of non-zero 
eigenvalues. In fact, the rank is usually equal to the number of non-zero 
eigenvalues for the square matrices appearing in communications, but one can 
create counterexamples where this is not the case. Later in this section, we 
will provide additional conditions that guarantee equivalence. 

Recall from (2.30) that the matrix-vector product Hc is computed as a 
linear combination of the columns of H with coefficients from c. Suppose we 
want to create a set Hci, Hc2,... of linearly independent vectors (or even 
mutually orthogonal vectors) by multiplying H by different vectors c1, C2, ... 
The rank of H limits how many such vectors we can create. The rank property 
will be utilized in later chapters to quantify how many parallel data streams 
we can transmit over a communication channel, where the matrix dimensions 
represent antennas and/or frequency bands. 
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Example 2.6. Let c1,...,Cg € C* be K linearly independent vectors. For 
an arbitrary matrix H € C*** that has rank r satisfying r < K, show that 
Hc,,...,Hcx cannot be linearly independent. 

Since the rank of H is r < K and the number of non-zero eigenvalues is 
smaller than or equal to the rank, H must have at least K — r zero-valued 
eigenvalues. Consequently, there must exist an eigenvector x Æ 0 satisfying 
Hx = 0. Since c;,...,c are linearly independent, any such non-zero x € C* 
can be expressed as D QkCk for some selection of the coefficients, with not 
all a; being zero. Inserting x = >> = QpCR into Hx = 0, we obtain 


K E 
H > anen) = ye a,He, = 0. (2.38) 
k=l 


k=l 


According to Definition 2.1, Hc4,..., Hcg are linearly independent if and 
only if the above linear system of equations (with respect to a1,...,aK € C) 
only has the solution a; =... = ax = 0. However, for a non-zero eigenvector 
x, we should have at least one non-zero az, which implies Hc,..., Hcg 
cannot be linearly independent if the rank of H is strictly less than K. 


Square matrices can be factorized and diagonalized using the eigenvalues 
and eigenvectors. For brevity, we will only present this eigendecomposition in 
the special case of symmetric matrices, which are defined as follows. 


Definition 2.3. A matrix A is Hermitian if A = AF. 


Only square matrices can be Hermitian, and the condition A = A™ implies 
a specific symmetry: the entries at the opposite sides of the diagonals have 
the same real part, while the imaginary parts have the same magnitude 
but opposite signs. The symmetry implies that any eigenvalue of A must 
satisfy A = A*, which only holds if the imaginary part is zero. Hence, all the 
eigenvalues of Hermitian matrices must be real-valued. One common type of 
matrix that satisfies the Hermitian property is covariance matrices, which will 
be described later in this chapter. Before considering the eigendecomposition 
of Hermitian matrices, we will define one more type of matrix. 


Definition 2.4. A matrix U c CM*M is unitary if U"U = Iy and UU” = I y. 
The former implies that the columns of U are mutually orthogonal, while the 
latter implies that the rows are mutually orthogonal. 


A unitary matrix’s column vectors are an orthonormal basis in C™¥. We 
notice that the conjugate transpose U" of a unitary matrix U acts as a matriz 
inverse because their multiplication results in an identity matrix. This is 
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the matrix extension of how 1/u is the inverse of the scalar u because their 
multiplication is 1. If the eigenvectors of a Hermitian matrix are placed as 
the columns of a matrix, it will be a unitary matrix. 


Lemma 2.1. Any Hermitian M x M matrix A can be factorized as 
A = UDU”, (2.39) 


where U is a unitary M x M matrix containing the unit-length eigenvectors 
as columns and D = diag(à1,..., Am) is a diagonal matrix containing the 
corresponding real-valued eigenvalues. 


The factorization in (2.39) is known as the eigendecomposition. For a Her- 
mitian matrix, the rank is exactly equal to the number of non-zero eigenvalues. 
If we let u1,..., ŭu denote the columns of U (i.e., the eigenvectors), then we 
can also express (2.39) as 


M 
A= Ñ AmUnus,. (2.40) 


m=1 


Hence, the matrix is the summation of the eigenvalues multiplied by the 
respective eigenvectors. This property can be utilized to diagonalize the 
matrix. More precisely, we can rearrange (2.39) as 


U"AU =D (2.41) 


by utilizing the properties of unitary matrices. This shows how the Hermitian 
matrix A can be transformed into the diagonal matrix D with eigenvalues by 
multiplying with the matrix U containing the eigenvectors. 

Non-Hermitian square matrices can also be diagonalized, but the notation 
is more complicated, and one can find special cases where it is not possible. 
Since we will not utilize those results, we will not cover them here. 

If all the eigenvalues of a Hermitian matrix A are non-zero, then the matrix 
is invertible. This implies that there exists a matrix denoted as A~! with the 
property that AAT! = A~tA = Im. By utilizing the eigendecomposition in 
(2.39), we can notice that the inverse can be computed as 


A~ = UDU", (2.42) 


where D~! = diag(\;',...,j;). Hence, the inverse matrix has the same 
eigenvectors but reciprocal eigenvalues. 

If all the eigenvalues of a Hermitian matrix A are non-negative, then the 
matrix is said to be positive semi-definite. The reason is that x*"Ax > 0 for 
all vectors x of matching dimension, because (2.40) implies that x"Ax = 
Sei Am|ut,x|? which only has non-negative terms. For such matrices, we 
can define the square root of the matrix as follows. 
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Lemma 2.2. Any Hermitian M x M matrix A that is also positive semi- 
definite has a square root defined as 


Al? = UD'UE, (2.43) 


using the notation from Lemma 2.1 with D!/? = diag( VAi, . . -, VAm). The 
square root A‘/? is also Hermitian and satisfies the property A!/?A1/? = A. 


If all the eigenvalues of the Hermitian matrix A are strictly positive, then 
the matrix is said to be positive definite. In this case, both the matrix and its 
square root are invertible. The inverse square root is denoted as 


A“? SUD, (2.44) 
where D~!/? = diag(1//A1,--.,1/VAm). 
Example 2.7. Consider a Hermitian matrix A € C“’*™ with the eigende- 


composition 
A = UDU". (2.45) 


What is the eigendecomposition of B = A + ely, if € > 0? 
Since U is a unitary matrix (i.e., UU" = Iņ), we can express B as 


B=A+edy =UDU* + cUU* =U(D+ dy) U*, (2.46) 


which has the correct structure to be its eigendecomposition. Hence, adding 
a scaled identity matrix to A does not change the eigenvectors, but all the 
eigenvalues are increased by the scaling factor e. 


The following matrix inversion lemma can be helpful when analyzing 
expressions containing invertible matrices. 


Lemma 2.3. Consider the matrices A € CXM, Bec’, Cec. 
and D € CXM. The following identity holds if all the involved inverses exist: 


(CSE SYOID SE 8 SS BDA B+O IHDA! (2.47) 


A special case of this lemma, known as the rank-one update formula, is 
obtained when A is an invertible Hermitian matrix, C = 1, B = x € C™ isa 
vector, and D = x": 


1 
H\—1l _ —1 —1 Ha-—l 
(A +xx") =A“ — i ey eas xx" A™. (2.48) 
If we multiply the expression in (2.48) by x from the right, we obtain 
1 -1 


(A + xx")7!x Xx, (2.49) 


eee ees 
1+ x#A-lx 
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which shows that the vectors A~'x and (A + xx")~!x are equal except for 
the scaling factor 1 + x"A~!x. This property will be utilized in this book 
when analyzing different signal processing methods. 

Consider two matrices A € C”’** and B €e C**™ having opposite 
dimensions, which means that it is feasible to compute both the matrix 
products AB and BA. A matrix identity similar to the matrix inversion 
lemma is 


(AB +Im) A = (AB + Im)" A(BA+Ix)(BA+Ix)* 
= (AB + Im) (AB + Im) A (BA + Ig)" 
= A(BA+Ixg)*, (2.50) 


where the matrix A is moved from one side of the inverse to the other side. The 
content of the inverse is also changing and, interestingly, the identity matrix 
changes dimension. There is a deeper matrix algebraic property enabling this 
result. The eigenvalues of AB and BA are always the same, except that the 
bigger of these matrices has |M — K| extra eigenvalues that are equal to zero. 
This can be proved as follows. We let u denote an arbitrary eigenvector of 
AB associated with the eigenvalue A, so that ABu = Au. It then follows that 
Bu is an eigenvector of BA with the same eigenvalue because 


ABu = B(Au) = B(ABu) = BA (Bu). (2.51) 
One can further prove that the eigenvalue multiplicity is the same. A conse- 
quence is that we can switch the matrix order in the trace function as 
tr(AB) = tr(BA) (2.52) 
because the sum of the eigenvalues is the same for AB and BA. Another 
consequence is Sylvester’s determinant theorem 
det (AB + Im) = det (BA + Ix) 3 (2.53) 


which holds because the identity matrix adds one to all the eigenvalues, and 
the determinant then multiplies them together. The matrix identities in (2.52) 
and (2.53) will be used repeatedly in this book. 

Consider the two vectors x = [%1,%2,...,@y|7 and y = [y, y2,---, YK]”, 
which might have different dimensions. The Kronecker product between these 
vectors is defined as 


T1y 


T2Y 
x@Qy= : i (2.54) 


£MY 
which is an M K-dimensional vector. The first K entries contain x; multiplied 
by each of the entries of y, the next K entries contain x2 multiplied by each 
of the entries of y, etc. The Kronecker product is closely related to the outer 
product yxT between the same vectors. One obtains the Kronecker product 
by stacking the columns of yx” into a single vector. 
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2.2 Probability Theory 


This book will use random variables to describe signals, noise, and commu- 
nication channels. Any continuous random variable x is entirely determined 
by its probability density function (PDF), which we denote by f,(x). This 
function determines how the probability mass is distributed over all possible 
realizations. The realizations of the random variable take values in some 
sample set Q, which is typically the real space R or the complex space C. The 
probability of obtaining a realization in a subset A C Q of the sample set is 
the integral of the PDF over that subset: 


Pr{x € A} = L AA (2.55) 


When considering complex random variables, the integral in (2.55) should 

be interpreted as a double-integral over the real and imaginary parts. The 

PDF f(x) is non-negative for all x € Q and the total probability is one: 

Jo fk(x)ðx = 1. Hence, the probability Pr{x € A} is between zero and one. 
Based on the PDF, we can compute the (arithmetic) mean 


Bo} = | wf.(e)de, (2.56) 


which is also known as the expected value, first moment, and average. The 
variability is often measured by computing the squared deviation |x — E{x}|? 
from the mean and taking its mean. It is denoted Var{x} and computed as 


x — E{x}|?} = E{|x|*} — |E{x}|?. (2.57) 


This is known as the variance or second moment, and it measures how large 
variations from the mean we can expect to observe when generating many 
realizations. It is essential to use magnitudes in (2.57) when the random 
variable takes complex values. If the random variable has zero mean, then 
(2.57) shows that the variance coincides with the quadratic mean computed as 


Var{x} = E{ 


E{ 


oP} = f le Aaz. (2.58) 


It is common in the probability theory literature to use a different notation 
for the random variable and its realizations; for example, x for the variable 
and «x as the realization. In this book, we have instead chosen to use the same 
notation but write out what is considered in each context. 


Definition 2.5. The random variables x and y are statistically independent if 
their joint PDF fxy(x,y) can be factorized as 


fry(@y) = fæ) fy), (2.59) 
where f,(x) and fy(y) are their individual PDFs, called marginal PDFs. 


68 Theoretical Foundations 


This independence concept is entirely different from the linear indepen- 
dence of vectors in Definition 2.1. Statistical independence of random variables 
implies that the realization of x will not affect the realization of y whatsoever, 
which happens in practice when the variables are associated with different 
sources of randomness. For example, in communications, the variable rep- 
resenting random data from the transmitter is typically independent of the 
variable representing random thermal noise in the receiver hardware. 

We will now consider L independent realizations of the same random 
variable, which can be thought of as having L independent and identically 
distributed random variables (i.e., with the same marginal PDF), and generate 
one realization from each of them. Suppose we compute the arithmetic average 
of these realizations. In that case, we will obtain a value close to the mean in 
(2.56), at least under the technical condition that the variance is finite. This 
result can be formalized mathematically as the following law of large numbers. 


Lemma 2.4. Let 71,...,x21 be a sequence of L independent and identically 
distributed random variables with mean E{z;} = u and finite variance o° for 
i= 1,..., L. The arithmetic sample average ¢ wes x; satisfies 


1b 
a l 
im T dm = h (2.60) 


We will utilize this lemma when studying the impact of random variables 
on communication performance and also as a way to approximate an unknown 
mean value using multiple realizations from the random variable. 

The variance measures the average squared deviation, which has a different 
unit than the original variable (i.e., it is squared). The square root \/Var{a} 
of the variance can be utilized to understand better how large deviations from 
the mean are likely to occur. This measure is called the standard deviation, 
and whenever the variance is finite, most random realizations will occur within 
a few standard deviations from the mean. The exact characteristics depend 
on the distribution of the random variable, but the following worst-case result 
known as Chebyshev’s inequality can be established. 


Lemma 2.5. Consider a random variable x with mean E{x} = u and finite 
standard deviation o = \/Var{x}. For any constant k > 0, it holds that 


1 
Pr{|z — u| > ko} < Re (2.61) 


Suppose we insert k = 2 or k = 3 into Lemma 2.5. In that case, the 
inequality says that the probability of obtaining realizations that are more 
than two or three standard deviations from the mean is smaller than 0.25 and 
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95% of all realizations 
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Figure 2.5: The PDF of the zero-mean Gaussian distribution z ~ N (0, øg?) with the standard 
deviation indicated. If another mean value is considered, the PDF is shifted to be centered 
around it. 95% of all realizations occur between —20 and 2c. 


0.11, respectively: 


0.25 ifk=2, 


(2.62) 
0.11 ifk=3. 


Pr{la — p| > ko} < 


Since Chebyshev’s inequality provides an upper bound on the probability of 
obtaining realizations further than k standard deviations from the mean, most 
random distributions have a much smaller probability than that. In other 
words, Chebyshev’s inequality characterizes the worst-case situation of having 
a distribution with a high probability of realizations far from the mean. 


2.2.1 Gaussian Distribution 


A common example is a Gaussian random variable, which is denoted as 
x ~ N(u,07) and has the PDF 


Lig € 307. (2.63) 


This distribution has the mean E{x} = p, variance Var{x} = E{(x—)?} = o°, 
and standard deviation \/Var{z} = ø. The PDF is illustrated in Figure 2.5 
and is symmetric around the mean value. When the mean is zero, and the 
variance is one, we have a standard Gaussian distribution. 
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The Gaussian distribution is also known as the normal distribution since 
it has become the norm to utilize it as an approximation of other random 
distributions. A contributing factor is the following classical result, called the 
central limit theorem. 


Lemma 2.6. Let xz1,..., £z be a sequence of L real-valued independent and 
identically distributed random variables with zero mean and finite variance 
o*. As L - ov, the distribution of 


1 L 
X (2.64) 
yilo? i 


converges to a standard Gaussian distribution M (0, 1). 


The interpretation of this theorem is that the summation of a set of 
independent and identically distributed random variables tends to be approxi- 
mately Gaussian distributed, with the approximation error being smaller the 
more variables are considered. This property is often used in communications 
to motivate that the noise in the receiver hardware is Gaussian distributed 
(because the random motion of many electrons creates it) and that wireless 
channels behave as Gaussian distributed when they contain many propagation 
paths, which will be considered later in this book. 

The scaling factor 1/v Lo? in (2.64) was selected so that the variance of 
the quantity becomes one, instead of going to zero or infinity when adding 
L terms and letting L — oo. However, any scaling factor can be utilized 
along with Lemma 2.6 if the central limit theorem is merely used to motivate 
that the summation of a finite number of independent random variables is 
approximately Gaussian distributed. For example, the law of large numbers in 
Lemma 2.4 considered the sample average and when combined with Lemma 2.6, 
we obtain 


ee o? 
PESAN (u S (2.65) 
where the notation ~ means approximately distributed as. The variance in 
(2.65) goes to zero as L — oo, which implies that the sample average converges 
to the mean p, as previously stated in the law of large numbers. The added 
benefit of (2.65) is that it also suggests that the variance goes to zero as 1/L 
and that the deviation from the mean is approximately Gaussian distributed. 

The Gaussian distribution has unbounded support (i.e., we can get ar- 
bitrarily large positive or negative realizations), but the probability mass is 
concentrated around the mean. In fact, it is much more concentrated than the 
worst-case situation determined by Chebyshev’s inequality. The probabilities 
of obtaining realizations that are beyond one, two, or three standard deviations 
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away from the mean value are 


i i 0.32 ifk=1, 
Pr{|z — u| > ko} =1— J e 370r ~ < 0.05 ifk=2, (2.66) 
2 
ane ane 0.003 ifk =3. 


Hence, only 5% of all realizations are beyond two standard deviations from 
the mean, while (2.62) states that it can be the case for up to 25% of all 
realizations when considering an arbitrary random distribution. Figure 2.5 
illustrates that 95% of all realizations appear from —2o to 2c. 


2.2.2 Complex Gaussian Distribution 


We will now consider the complex generalization of the Gaussian distribution. 
Suppose a,b ~ N(0,07/2) are two independent Gaussian variables, each 
having zero mean and variance 07/2. The complex variable x = a + jb will 
then have a complex Gaussian distribution. We denote it as z ~ Nc(0, o°?) 
and the PDF is 


fla) = — e oF. (2.67) 


This distribution has the mean E{a} = 0 and variance 


Var{x} = E{ 


z|?} = Efa? +b} = o?, (2.68) 


where the real and imaginary parts each contribute with 07/2. The PDF in 
(2.67) is illustrated in Figure 2.6 and has the classical shape of a Gaussian 
distribution but in two dimensions. There are other types of complex Gaussian 
distributions than the one described above. To be precise, we have defined 
what is known as the circularly symmetric complex Gaussian distribution. The 
circular symmetry refers to the fact that if £ ~ Nc(0,07), then xe!” has the 
same distribution for any value of y € R. In other words, the distribution 
does not change when applying a phase-shift. This property can be proved by 
noticing that f,(x) = f,(ze”) for the PDF in (2.67). The circular symmetry 
implies that we can rotate the PDF in the complex plane without changing 
its shape, as seen from Figure 2.6. Looking at the mean value, the circular 
symmetry implies E{x} = E{re”} = el” E{x}, which only holds for all y € R 
if E{x} = 0. Hence, all circularly symmetric distributions have zero means. The 
circular symmetry follows from the assumptions of having independent and 
identically distributed real and imaginary parts. One can define other complex 
Gaussian distributions that do not satisfy these conditions, but these are not 
considered in this book. We will refer to the circularly symmetric complex 
Gaussian distribution as the complex Gaussian distribution for brevity. 
Multiplying a complex Gaussian distribution with a constant c € C will 
change the variance but not the shape of the distribution. Suppose x ~ 
Ne(0,07) and recall that the variance can be computed as E{|z|?} = o? since 
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Figure 2.6: The PDF of the circularly symmetric complex Gaussian distribution z ~ Nç (0, 1). 
The real and imaginary parts are statistically independent and jointly Gaussian distributed 
with identical variance. 


the variable has zero mean. The random variable cx will, therefore, have the 
cx|?} = |c|?E{|z|?} = |c|?0?. This implies that cx ~ Nc(0, |c|?a7). 


variance E{ 


2.2.3 Covariance and Conditional Distribution 


Multiple random variables can affect a communication system, some inde- 
pendent (see Definition 2.5) and others statistically dependent. Consider the 
two independent random variables v ~ Nc(0,07) and w ~ Nc(0, o2). The 
summation of these variables is also complex Gaussian distributed and has a 
variance that is the summation of the individual variances: 


z = v +w ~ Ne(0, o2 +02). (2.69) 


Although v and w are independent variables, z is clearly dependent on both. 

The variance concept can be extended to measure the covariance between 
two random variables. For two arbitrary random variables z and v, the 
covariance is defined as 


E{(z — E{z})(v — E{v})*} = E{zv*} — E{z}E{v*}, (2.70) 


where the complex conjugate is important when the variables are complex. 
The variables are said to be uncorrelated if the covariance is zero, while a non- 
zero covariance measures how strongly the random realization of one variable 
affects the realization of the other variable. Independent random variables are 
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always uncorrelated, but the converse might not hold: uncorrelated variables 
can still influence each others’ realizations but in more subtle ways. 
The covariance in (2.70) can be both positive and negative, and takes values 


between —\/Var{z}Var{v} and ,\/Var{z}Var{v}. The bounds are achieved 


when the two variables are equal except for a negative/positive scaling factor. 


Example 2.8. What is the covariance between z and v, defined in (2.69)? 
Direct computation based on the covariance definition in (2.70) yields 


E{(z —E{z})(v — E{v})*} = E{zvu*} = Ef{vo*} + E{we*} = o2, (2.71) 


where the last equality follows from the fact that E{wv*} = E{w}E{vu*} = 0 
since w and v are independent. The non-zero covariance demonstrates that z 
and v are dependent random variables and implies that their realizations are 
statistically connected, which is logical since z = v + w. 


Suppose we can observe z but want to know the value of v. We are then 
interested in the conditional PDF f,,(v|z) of v given the realization of z. If 
we know the opposite conditional PDF f,,(z|v), we can compute f,),(v|z) 
using Bayes’ theorem: 


_ Fav(zlv) flo) 
f(z) l 


This rule says that fyz(v|z) and fav(z|v) are equal up to the scaling factor 
fi(v)/f2(z). We can compute this factor using the marginal PDFs of z and v. 


flol) (2.72) 


Example 2.9. Determine the conditional PDFs f,)y(z|v) and fyjz(v|z) that 
relate the random variables v and z that were defined in (2.69). 
If we know v, then z — v = w ~ Nc(0, o2). This implies that 


a jz—v|? 


1 a= 
fav(z|v) = 702, “i: (2.73) 


We can now compute fyjz(v|z) using Bayes’ theorem in (2.72): 


lz—v]2 |v|? 
1 oz o2 2 2 Eo 2 2 
oz, © ei mo © ‘i Oy ow = + +h t 4 2 
fuz(vlz) = BE Sa ee) oe 
-= TO<0, 
il otto U~ Ww 
"easy: 
Dae 2 2 
2 2) Cutow Tv 
=, Oy T Ow o2o2, i o2+o2 a 2.74 
= DO) ( 4 ) 
TOLO%, 


This conditional PDF resembles that of the complex Gaussian distribution. 
2 
ato 


2 2 
In particular, v — z ~ Ne (0, aie, when z is known. 
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2.2.4 Multivariate Complex Gaussian Distribution 


A random vector can be created by taking a collection of M random scalar 


variables 71,...,2 ,¢ and collecting them in a vector 
Tı 
= Me (2.75) 
TM 


This is also known as a multivariate random variable, and the mean is denoted 
as E{x}. The variance of the individual entries and the covariance between 
any pair of entries is captured by the covariance matrix Cov{x} defined as 


Cov{x} = E{(x — E{x})(x — E{x})*}. (2.76) 


If we take the conjugate (Hermitian) transpose of this expression, we will get 
the same expression, which shows that all covariance matrices are Hermitian 
matrices (see Definition 2.3). The covariance matrix is also positive semi- 
definite because, for any deterministic y € C™, it holds that 


y"Cov{x}y = E{y"(x — E{x})(x — E{x})"y} = E {ly"(« — E{x})]"} > 0. 
(2.77) 
The correlation matriz is similarly defined as E{xx"} without subtracting the 
mean. This implies that a deterministic vector x has a zero-valued covariance 
matrix but xx" as its correlation matrix. Hence, the covariance matrix is a 
better measure of the amount of randomness in the considered vector. 
Suppose the M variables are independent and identically distributed 
complex Gaussian variables with variance 07; that is, £m ~ Nc(0,07) for 


m =1,...,M. The mean value is E{x} = 0 since each of the individual 
variables has a zero mean. Moreover, the covariance matrix is 
Cov{x} = E{(x — E{x})(x — E{x})"} = E{xx"} = o07Iy, (2.78) 


where the diagonal entries are the variances of the individual entries and 
the zero-valued off-diagonal entries represent that the independent variables 
have zero covariance. This multivariate complex Gaussian distribution with 
independent entries is denoted as 


x ~ Nc(0,o°Im). (2.79) 


This distribution is often utilized to model receiver noise in communication 
systems. It is then referred to as white Gaussian noise, where the color (or lack 
thereof) indicates the independence of the entries. Following Definition 2.5, 
the PDF of x is the product of M marginal PDFs of the kind in (2.67): 


T 1 en? 1 iy? 
fx(x) = II =e $ = (ro2)Me BE y (2.80) 
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In this book, we will also consider a complex Gaussian random vector 
with correlated entries, in which case the covariance matrix is not an identity 
matrix. We can create such a matrix by starting from a K-length unit-variance 
complex Gaussian random vector with independent entries x ~ Nc(0,Ix) 
and an M x K deterministic matrix A with M < K. We can then create an 
M-length complex Gaussian random vector x by computing the product 


x = AX, (2.81) 


irrespective of whether M and K are the same or different. This new random 
vector has zero mean since 


E(x} = A E{3} = 0. (2.82) 


The covariance matrix can be computed as 


Cov{x} = E{(x — E{x})(x — E{x})"} = E{xx"} 
= A E{33"} A" = AA". (2.83) 
— 


=Ix 


Hence, the random vector created by the product in (2.81) is distributed as 
x ~ Nc(0, AA”). This example shows how we can create a correlated complex 
Gaussian vector x from a complex Gaussian vector X with independent entries 
by multiplying with a matrix, which will happen later in this book. 


Example 2.10. Show that if x ~ Nc(0,Ix), then x = Ux has the same 
distribution if U € C*** is a unitary matrix. 

The vector x is created as in (2.81) with A = U. The corresponding 
covariance matrix is computed in (2.83) and becomes AA" = UU" = Ix 
since U is unitary. It follows that x ~ Nc(0, Ig), which is the same distribution 
as X has. The conclusion is that a vector with uncorrelated complex Gaussian 
entries retains its distribution when multiplied by a unitary matrix. 


In general, we can define the correlated multivariate complex Gaussian 
distribution 

x ~ Nc(0,R) (2.84) 

for an arbitrary positive definite covariance matrix R. The special case 


considered above correspond to R = AA". The PDF is given by 


A(x) = aa (2.85) 


Such correlated complex Gaussian vectors are circularly symmetric since 
f(x) = f(xel”) for any constant phase-shift y. 
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An important property of complex Gaussian random vectors is that the 
joint PDF in (2.85) reduces to the one for independent entries in (2.80) if we 
insert the diagonal covariance matrix R = o7Ij,. Hence, it is sufficient to 
assume that all the entries of x are uncorrelated (i.e., the off-diagonal entries 
of R are zero) to get statistical independence as a side-effect. This property 
follows from the shape of the multivariate complex Gaussian distribution 
and does generally not hold for other random distributions. We will use this 
property repeatedly in the book. 


Lemma 2.7. If two random variables are jointly complex Gaussian distributed 
and uncorrelated, the variables are also statistically independent. 


When exposed to a correlated complex Gaussian random vector x, removing 
the correlation through signal processing can sometimes be helpful. Since 
the covariance matrix R in (2.84) is positive definite, its square root R'/? 
(computed as in Lemma 2.2) is invertible and its inverse will be denoted as 
R~!/2. Let us define the random variable n = R~!/2x. It is complex Gaussian 
distributed with zero mean and the covariance matrix 


Cov{n} = E{(n — E{n})(n — E{n})"} = E{nn*} 
= R1? E{xx"} R71? = Im. (2.86) 
——— 


=R 


Hence, n = R~!/?x ~ Nc(0, Im) has uncorrelated entries, which are also 
statistically independent thanks to Lemma 2.7. This procedure of removing 
correlation from a random vector is known as whitening, particularly when 
dealing with Gaussian noise. A noise vector with correlated entries is called 
colored noise, and the whitening procedure transforms it into white noise, as 
defined in (2.79). The theory developed in this book will be based on the 
assumption of having white noise, but it can also be applied in the presence 
of colored noise by adding a whitening step at the receiver. 


Example 2.11. What is the PDF of a multivariate real Gaussian distribution? 
If tm ~ N (Um, 0°) for m = 1,..., M are independent variables, then the 


PDF of x = [z1,..., £m]? is cee ee. We obtain this 
expression by taking the product of M PDFs of the kind in (2.63) and defining 
u = |m, ..-,um|". When the variables are correlated with the covariance 


matrix R, the resulting PDF is 


1 _ G@e=w) TRO p) 
2 


(2m) © /det(R) | 


f(x) = (2.87) 


We denote such a real Gaussian distribution as x ~ N (u, R). 
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2.2.5 Rayleigh, Exponential, and x? Distribution 


The PDF of a complex random variable x determines how the magnitude 
|x| and the argument arg(a) are distributed. These components are generally 
correlated, but if z is complex Gaussian, the circular symmetry implies 
that they are independent. In wireless communications, we are particularly 
interested in the magnitude since it can describe the amplitude of a signal. 
We denote the magnitude as y = |x| > 0 and the argument as y% = arg(a) € 
[—7, 7), so that x = ye’. Since the PDF of the complex Gaussian distribution 
in (2.67) is defined using the Cartesian form z = R(x) + jS(x), a change of 
variables to the polar form consists of two steps: replacing the old variables 
with the new variables, followed by the multiplication with the magnitude of 
the Jacobian determinant, |J (y, Y)|. We can compute the latter term based 


on the definition of Jacobian matrices as 
PRla) ee) oy cosy) Oy sin() 
det (Ez ata) | = |det (fa cosy) Əy A) | 
Ow Ow ap Op 
_ cos(q) sin(w) _ 2 2 B 
= ace (s eee a (cos? (y) + sin*(w)) = y. 
(2.88) 


Jy, y) = 


Using this method, we can rewrite the PDF in (2.67) of the complex Gaussian 
distribution as a function of the magnitude and argument: 


Ce (2.89) 
yw, eae” g ; 
for y > 0 while it is zero for y < 0. Since the PDF does not depend on 7, we 
can conclude that w is uniformly distributed between —r and a (or any other 
interval of length 27) and independent of y. We can compute the marginal 
distribution of the magnitude as 


Tv 


fy = | fywly ep = Ben fory>0. (2.90) 


This PDF characterizes the variations in the magnitude of a complex Gaussian 
random variable. It matches with what is known as the Rayleigh distribution. 
Just as the complex Gaussian distribution is characterized by its variance o°, 
the Rayleigh distribution is characterized by a scale parameter. For the PDF 
in (2.90), the scale parameter can be identified to be a/\/2 and, thus, we can 
express the distribution of the magnitude as y ~ Rayleigh(a/V2). The PDF 
with o = 1 is illustrated in Figure 2.7. When a communication channel is 
complex Gaussian distributed, it is referred to as Rayleigh fading since the 
magnitude is Rayleigh distributed. We will return to this later in the book. 
When analyzing the SNR of a communication system, we are not interested 
in the amplitude y = |x| but its square y? = |x|? (the SNR is a ratio between 
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the signal power and noise power). Let us denote this random variable as 
z = y’. We can obtain the PDF of z by following the same two steps as above: 
replace the y in (2.90) with \/z and then multiply by the magnitude of the 
Jacobian determinant |J(z)|, which is |Oy/0z| = 1/(2,/z) in this case. Using 
this method, we obtain 


f.(z)=—e-2? for z>0, (2.91) 
o 


while it is zero for z < 0. This PDF characterizes the variations in the squared 
magnitude of a complex Gaussian random variable. It matches what is known 
as the exponential distribution. This distribution is generally characterized 
by a so-called rate parameter, which in this case equals 1/07. Hence, we can 
express the distribution of the squared magnitude as z ~ Exp(1/o7). The 
PDF with o? = 1 is illustrated in Figure 2.7. 

A useful property of the exponential distribution is that 


E{z"} = n!(o?)” (2.92) 


for any positive integer n, where n! denotes the factorial. 


Example 2.12. Suppose x ~ Nc(0,07). What are the mean, quadratic mean, 
and variance of |x|?? 

Since z = |x|? ~ Exp(1/o7), we can utilize the property in (2.92) to 
compute the mean, quadratic mean, and variance of |x|? as follows: 


Beth = Bay — oe, (2.93) 
Di h Se eee (2.94) 
Var {|x|?} = E {|x|*} — (E {|x} = of. (2.95) 


We can also utilize the property in (2.92) when computing mean values that 
involve an M-dimensional complex Gaussian random vector with independent 
entries: x = [x1,...,2u]7 ~ Nc(0, o?°Im). Since zm = |£m|? ~ Exp(1/o7) for 
m= 1,..., M, we can compute mean, quadratic mean, and variance of the 
squared norm ||x||? as follows: 


M 
E {lix} = > E{zm} = Mo?, (2.96) 
mæl 
M 2 M M M 
E {|Ixl|t} = E 2 en) = X E{zm} + > > E{zm}E{zn} 
m=1 m=1 m=i n=1 
nám 
= 2Mo* + M(M — 1)o?o? = (M? + M)ot, (2.97) 


Var {||x||?} = E {||x||*} — (E {xl}? = Mot. (2.98) 
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Figure 2.7: Examples of the PDFs of the Rayleigh distribution, exponential distribution, and 
x?-distribution. 


These results were obtained by utilizing the fact that ||x||? = S4] zm 


is the sum of M independent random variables with identical exponential 
distribution. By utilizing the fact that the PDF of a sum of independent 
random variables is the convolution of the marginal PDFs, one can show that 
the squared norm has the PDF 


=ð 
xM-le E 


fixie (2) = (o2)M(M — 1)! for x > 0, (2.99) 
while it is zero for x < 0. This distribution is often referred to as the y?- 
distribution in the communication literature and denoted as y?(2M), where 
2M is called the degrees of freedom since ||x||? is the sum of 2M squared real 
Gaussian variables. However, formally speaking, it is only in the special case 
of ø? = 2 that one obtains that random distribution. Hence, we will refer to 
(2.99) as the scaled x?-distribution in this book. The mean Mo? of ||x||? was 
computed in (2.96), while the variance Mo* was computed in (2.98). If we set 
M =1, then the y?(2M)-distribution reduces to the exponential distribution. 
The PDF with M = 2 and o? = 1 is illustrated in Figure 2.7. 


2.2.6 Cumulative Distribution Function 


It is common to compare the realization of a real-valued random variable 
with a threshold when analyzing the performance of a communication system. 
Suppose the random variable is x and the threshold is a, then the probability 
Pr{a < a} of x taking realizations smaller than or equal to a is important. 
To characterize how its value depends on the threshold, we can define the 
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Figure 2.8: The CDF of the Rayleigh distribution for a? = 1, where the 25% percentile, median, 
and 75% percentile points are marked. 


cumulative distribution function (CDF) F,(a) as 
F(a) = Pr{z < a} = I f.(a)Ox, (2.100) 


which is computed by integrating the PDF from its lower limit (generally 
from —oo, but we can start from 0 for positive random variables) to a. The 
CDF is a monotonically increasing function of a since we are integrating the 
non-negative PDF f(x) over an increasing interval. Moreover, it only takes 
values between 0 and 1, which equal the probability of the event Pr{x < a}. 
The CDF provides a full characterization of the random distribution, just 
as the PDF does; for example, the PDF can be retained from the CDF by 
computing the first-order derivative: 


o 
Ox 
The value of a for which F,(a) = 0.5 is known as the median of the 
distribution because it is equally likely to obtain a realization above and below 
it. If the CDF is strictly increasing and continuous, the inverse CDF Fo'(y) 
exists and is called the percentile function. We can then compute the median 
as F-1(0.5). The point F,-1(0.25) is called the 25% percentile since 25% of 
all random realizations are below it, while the point F71(0.75) is called the 
75% percentile since 75% of all random realizations are below it (and 25% are 
above it). The small and large percentiles are of interest when analyzing a 
random variable’s worst-case and best-case realizations. 
Figure 2.8 shows the CDF of the Rayleigh distribution for g? = 1. The 
horizontal axis emphasizes the 25% percentile point \/In(4/3) where the CDF 


F(a) = f,(2). (2.101) 
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is 0.25, the median \/In(2) where the CDF is 0.5, and the 75% percentile 
point \/In(4) where the CDF is 0.75. Many different CDF curves can be 
drawn through these three points; thus, the entire CDF is required to obtain 
a complete statistical characterization of the Rayleigh distribution. The CDF 
and percentiles used in the figure are computed as follows. 


Example 2.13. Consider the Rayleigh distribution x ~ Rayleigh(a//2) in 
(2.90). What CDF and percentile function does it have? 


z2 
The PDF is fx(£) = 2e% for x > 0, thus the CDF becomes 


a 2r a? £ 2 a2 
KOS | =e 7 Or = re] — ee =., (2.102) 
oq oe 


The percentile function F-!(y) can be obtained by inverting the CDF in 
(2.102) as 


2 & 
ysl-e? > l-y=e? > In(l-y)=-S 
o 
F 1 
= FI Se Tea (2.103) 
=H 


We can use this function to identify any percentile of the distribution; 
for example, the median is F7+(0.5) = ov/In(2), the 25% percentile is 
F*(0.25) = o4/In(4/3), and the 75% percentile is F7 +(0.75) = o,/In(4). 


These values are indicated on the horizontal axis in Figure 2.8 for o° = 1. 


2.2.7 Random Process 


A random continuous-time signal z(t) is called a random process and is a 
generalization of a multivariate random variable. More precisely, if we take 
samples of a random process at the M time instances t;,...,t,~¢ and collect 
them in a vector 
u(t) 
: ; (2.104) 
x(t) 
then we obtain a multivariate random variable. 

The random processes considered in this book are wide-sense stationary, 
which means that the random distribution is constant over time. Three specific 
properties are satisfied for such processes. Firstly, the mean value u = E{x(t)} 
does not depend on the time t. Secondly, the variance o? = E{|x(t) — u|? } also 
does not depend on the time t. The third property relates to how the random 
process is correlated in time, measured by the autocorrelation function. The 
correlation between the samples at time tı and t2 should only depend on the 
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time lag tg — tı between the samples and not on their individual values. Hence, 
the autocorrelation function of a wide-sense stationary process is denoted as 


r(tg = tı) = E{a(t,)x*(t2)}. (2.105) 


A white random process changes so rapidly with the time that x(t) 
and x(t) are only correlated when tı = tg. This is represented by the 
autocorrelation function 


r(te = tı) = cô(tz — tı), (2.106) 


where c = |u|? + o? is called the power spectral density and 6(t) is the Dirac 
delta function. 

A complex Gaussian random process has the property that the vector in 
(2.104) becomes a multivariate complex Gaussian distribution, irrespective 
of the time instances at which the samples are taken. The noise in wireless 
communications is often modeled as a white complex Gaussian random process. 


2.3 Signal Modeling 


Wireless communication systems transfer data by utilizing electromagnetic 
signals. These signals propagate from the transmitter to the receiver over 
an analog wireless channel that acts as a system that filters the signal. This 
section provides the fundamental connection between the physical continuous- 
time signal models and the simple discrete-time models used in later book 
sections. We will use standard results from signals-and-systems theory to 
establish the connection. 

Suppose we are allowed to communicate using a real-valued passband 
signal with bandwidth B centered around a carrier frequency fe. For example, 
a typical scenario in the first 5G deployments is fe = 3 GHz and B = 100 MHz. 
The passband assumption implies that B < 2f, so that the signal does not 
contain the near-zero frequency range. In practice, we typically have B < fe, 
as in the given example. Let the transmitted signal be denoted as z,(t), where 
t € R is the continuous time variable and the subscript p indicates it is a 
passband signal. The amplitude spectrum of such a signal is sketched in 
Figure 2.9(a). The signal z,(t) is real-valued; thus, the spectrum is symmetric 
for positive and negative frequencies. 

Wireless channels generally have time-varying properties, for example, due 
to the movement of the transmitter, receiver, or objects in the propagation 
environment. However, we can divide the transmission into blocks such that 
the channel is (approximately) time-invariant within each block. Following 
that approach, we assume that the wireless channel can be represented by a 
linear time-invariant (LTI) system. A key property of such systems is that 
the filtering is entirely determined by the real-valued impulse response g,(t). 
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(b) Complex-valued baseband signal with bandwidth B/2. 


Figure 2.9: Sketch of a real-valued passband signal zp(t) with center frequency fe and 
bandwidth B that can be communicated over a wireless channel, and the equivalent complex- 
valued baseband signal z(t) that can be communicated over the complex baseband representation 
of the channel. The mathematical relation between the two signals is given in (2.111). 


In particular, the output signal vp(t) is the convolution between the input 
signal and impulse response: 


oO 


vp(t) = (z= f gp(u)ap(t—wdu. (2107) 


The impulse response must satisfy the technical condition fS |gp(t)|Ot < co 
for (2.107) to hold, but this is always the case in wireless communications 
since otherwise, one could receive more signal energy than was transmitted. 

The input-output relation in (2.107) is illustrated in Figure 2.10(a). We 
will later add the transmitter and receiver hardware to this model, including 
the additive noise, but we will first reformulate the basic relation. 


2.3.1 Complex Baseband Representation 


To avoid making the communication system design dependent on a particular 
value of fe, the signal processing algorithms used in wireless communications 
are developed for an equivalent baseband system where the signals are centered 
around the zero frequency. If we take the spectrum of the passband signal 
in Figure 2.9(a) and downshift it to the baseband, we obtain the equivalent 
signal z(t) whose amplitude spectrum is illustrated in Figure 2.9(b). This is 
called the complex baseband representation of the signal in Figure 2.9(a). If 
the hardware is designed to generate baseband signals of this type, we can 


84 Theoretical Foundations 


Channel 


2p(t) 9p(t) Up(t) = (Ip * Zp) (t) 


(a) Relation between the transmitted and received passband signals. 


Equivalent channel 


z(t) g(t) v(t) = (g * z) (t) 


(b) Equivalent relation using complex-baseband signals. 


Figure 2.10: Block diagrams of the input-output relations when transmitting a signal over 
a wireless channel. The practical system transmits passband signals but can be equivalently 
represented in the complex baseband. 


modulate the signals up to different carrier frequencies at different times (e.g., 
a mobile phone supports many bands so that it can be used worldwide). 

We can establish a mathematical connection between z,(t) and z(t) in the 
frequency domain by utilizing the Fourier transform F{-}. The frequency- 
domain representation of an arbitrary continuous-time signal a(t) is defined 
as 


A(f) = F{a(t)} = i i a(t)e P7 tat. (2.108) 


The Fourier transform is generally complex-valued, but it is conjugate sym- 
metric if the signal a(t) is real-valued: A*(— f) = A( f). This is proved as 


A*(-f)= ( i i T = / i a” (t)e P?" tot = A( f), (2.109) 


where the last equality follows from that a(t) = a* (t) for real-valued signals. 

The frequency-domain representations of the passband signal and baseband 
signal respectively become Z,(f) = F{zp(t)} and Z(f) = F{z(t)} when using 
the Fourier transform. We can then express the relation shown in Figure 2.9 


as 
zap) = MERA IS 


The scaling factor 1/ V2 ensures that the passband and baseband signals 
have the same energy; that is, [°° |Z)(f)|?Of = f°. |Z(f)|?Of. By taking 
the inverse Fourier transform of both sides of (2.110), it follows that the 
time-domain signals z)(t) and z(t) are related as 


B z(t)?" fet 4 2*(t)e Her fet 
Zp(t) = V2 
= VR (a) . (2.111) 


(2.110) 
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We notice that the amplitude spectrum of z(t) in Figure 2.9(b) is not symmetric 
for positive and negative frequencies, which implies that it is a complex-valued 
signal. Any real-valued passband signal z,(t) with bandwidth B can be 
equivalently represented by a complex-valued signal z(t) with bandwidth B/2 
according to (2.111). The bandwidth is halved, but there are instead both real 
and imaginary signal dimensions. The signal z(t) has the same total energy as 
zp(t), meaning that [°° |2(t)/?0t = f°. |zp(t)|?t, but the energy is moved 
to different frequencies.! 

Next, we would like to find a complex baseband representation of the entire 
output-input relation in (2.107), so we can abstract away the carrier frequency 
and only analyze the baseband. To this end, we let Gp( f) = F{gp(t)} denote 
the frequency response of the system, which determines how the channel filters 
different frequencies of the input signal. By taking the Fourier transform of 
both sides of (2.107) and utilizing (2.110), we obtain 


Tolf) = Fiul) = Gf) Zl) 
Af — f) + Zf — fe) 
=G 
_ GAN4Z0 — F) + GOPC E- fo) 
a : 
The last equality in (2.112) follows the fact that G,(f) = G% (— f) for real- 
valued systems. Since (2.110) and (2.111) provide a general connection between 
a passband signal and its equivalent complex-baseband signal, we can define 
the received signal u(t) in the complex baseband and relate it to the received 
passband signal as 
vp(t) = R (V, (2.113) 
YTF = fe) + ai -= fo) 
Y (f) = Fite = a : (2.114) 
where Y(f) = F{v(t)}. By comparing (2.112) with (2.114), we can identify 
the Fourier transform of the received baseband signal as 


VF = f) =GANAF- fe) => T) =G E fA). (2.115) 


Taking the inverse Fourier transform of (2.115) yields 


v(t) = (g * z)\(t) = P g(u)z(t — u)ðu, (2.116) 


—00 


(2.112) 


where the complex baseband representation of the system has the frequency 
response G(f) = Gp(f + fe) and impulse response 


g(t) = glia PT. (2.117) 


lif the total signal energy is infinite, we can compare the signal powers and conclude that 


these are equal. The power of a signal a(t) is computed as limt oo mn fe |a(t) |? Ot. 
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We identify (2.116) as an equivalent way to describe a continuous-time com- 
munication channel in the complex baseband. The input-output relation is 
illustrated in Figure 2.10(b). Note that the complex-baseband terminology only 
refers to the signals: we have taken the passband signal zp(t) and downshifted 
it to the complex-baseband signal z(t). In contrast, the impulse responses 
in wireless communications are neither passband nor baseband filters. In 
fact, the wireless medium supports communication at any frequency and 
bandwidth, and causes varying attenuation and delays to signals in different 
bands. However, by sending signals confined to a specific frequency range 
[fe — B/2, fe + B/2], we are only using the corresponding part of the wireless 
medium. In contrast, other systems can use different parts simultaneously. 
The only difference between g(t) in (2.117) and the original impulse response 
Jp(t) is that it has been downshifted along the frequency axis so that the 
channel filters the signal in an equivalent manner. 

Without loss of generality, we will consider the complex baseband in the 
remainder of this book, except at a few places where we model the impulse 
response gp(t) of a particular wireless channel and then use (2.117) to obtain 
the equivalent impulse response in the complex baseband. 


2.3.2 From Continuous Time to Discrete Time 


Digital data is described by a sequence of bits. In digital communications, these 
bits are further represented by a discrete data sequence {2z[I]} of symbols 
selected based on the bits, where the integer l is the discrete time index. 
The symbols are selected from the complex set C, such that x{l] € C. More 
precisely, a modulation and channel coding scheme is utilized to decide how 
many bits each symbol represents and how much redundancy is introduced to 
enable error correction in the receiver. We need to create a continuous-time 
signal z(t) that contains the data symbols {x[l]} and can be transmitted as 
an analog electromagnetic wave over the wireless channel. This is achieved by 
pulse-amplitude modulation (PAM). We will not explain all the underlying 
theory but focus on the properties needed to derive the discrete-time model 
we will use in the remainder of the book. 

The essence of PAM is that each of the symbols {z[l]} is multiplied by a 
continuous-time pulse and then transmitted one after the other. We consider 
PAM with the ideal sinc-pulse? 

p(t) = VBsinc(Bt) = Te (2.118) 
which has the Fourier transform 
P(f) = F{p(t)} = i w 


0, if |f| > B/2. (aea 


2Īn the communications and signal processing literature, the sinc function is defined as 
sinc(t) = sin(rt)/(rt) for t # 0 and sinc(0) = 1. Other definitions exist in other contexts. 
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This baseband pulse has bandwidth B/2, can be used as an ideal lowpass 
filter in the frequency domain, and has unit energy: [%_ |P(f)|?Of = 1. An 
illustration of these functions is provided in Figure 2.11. The sinc-function 
sinc(t) oscillates in the time domain with a linearly reducing amplitude and 
zero-crossings when t is a non-zero integer. Hence, VBsinc(Bt) has zero- 
crossing when t is a non-zero integer divided by B. We will exploit this feature 
to transmit a new data symbol z[/] every 1/B seconds while keeping them 
separable at the receiver. Any pulse function with these zero-crossings is 
said to satisfy the Nyquist criterion and could be used instead of the sinc- 
function, but one can prove that the feasible alternatives have a strictly larger 
bandwidth than B/2. The bandwidth of the transmitted signal will match that 
of the pulse; thus, we will consider PAM using the most bandwidth-efficient 
pulse in this book. If we increase B, then v Bsinc(Bt) will be compressed in 
the time domain (i.e., having more zero-crossings per second so we can send 
more data symbols), while it will expand in the frequency domain. 
When using PAM, the continuous-time complex-baseband signal is 


z(t) = 5 ste(t- 5). (2.120) 


k=—oo 


where we notice that a new symbol is transmitted every 1/B seconds and 
multiplied by a time-delayed version of p(t). It is common to refer to 1/B as 
the symbol time and B as the symbol rate (in addition to being the bandwidth). 
Notably, B complex-valued symbols are transmitted per second, and more 
bandwidth leads to a shorter time between the symbols. The PAM procedure 
is tightly connected to the Nyquist-Shannon sampling theorem [38, Th. 1], 
which can be stated for complex signals as follows [39, Sec. 2.8]. 


Lemma 2.8. If a complex-valued continuous-time signal z(t) only contains 
frequencies in an interval smaller than B Hz, it is entirely determined by a 
series of samples spaced 1/B seconds apart. 


Two commonly considered frequency intervals that satisfy this condition 
are —B/2 < f < B/2 and —B/2 < f < B/2, which can be written in short 
form as (—B/2, B/2] and [-B/2, B/2), respectively. The interval shrinks to 
(—B/2, B/2) for real-valued signals since such signals always contain the same 
positive and negative frequencies. The intuition behind the sampling theorem 
is that the largest frequency (in magnitude) determines how rapidly the signal 
can change. If the largest frequency is B/2 or —B/2 (but not both), then the 
fastest signal components have a period of 2/B. We can uniquely capture all 
signal variations if we sample the signal twice per period (i.e., at a sampling 
rate of B Hz). This specific sampling rate is known as the Nyquist rate and 
gives rise to B samples per second. It is also called the critical sampling rate 
to signify that it is fully acceptable to sample the signal more densely, but it 
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Figure 2.11: The unit-energy sinc function v Bsinc(Bt) is shown in the time domain in (a), 
while the Fourier transform is shown in (b). 


is critically important not to sample more sparsely in time because that will 
create ambiguity; that is, multiple signals can give rise to the same samples, 
which is known as aliasing. We are transmitting data at the Nyquist rate in 
digital communications, and it is the corresponding signal samples that we call 
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“symbols” and select to represent information bits. Since we are dealing with 
a complex-valued baseband signal, the B samples are also complex-valued. 

Figure 2.12(a) shows the pulses that are utilized for transmitting three 
subsequent symbols in PAM: p(t) = v Bsinc(Bt), p(t —1/B), and p(t — 2/B). 
More precisely, p(t) is multiplied by x[0], p(t — 1/B) is multiplied by z[1], 
and p(t — 2/B) is multiplied by «[2], and then summed up to create z(t). 
The symbol values become the amplitudes of the respective pulses, which 
explains why PAM stands for pulse-amplitude modulation. Figure 2.12(b) 
exemplifies the resulting PAM signal z(t) in (2.120) with x[0] = 1, z[1] = 0.5, 
and 2[2] = —0.5 (and z[k] = 0 for all other k). We notice that the duration 
of each pulse is much larger than the symbol time; thus, each symbol affects 
the shape of z(t) in a relatively broad time interval. This is an unavoidable 
side-effect of using pulses with as little bandwidth as possible. Nevertheless, 
we have z(k/B) = p(0)a[k] = V Bz[k] since the pulses are designed to have 
zero-crossings at all non-zero integers divided by B. This can be observed in 
Figure 2.12(b) where z(t) intersects the peak values of the respective pulses. 

We have now designed a transmitter that maps the discrete-time symbol 
sequence {x{I]} to a continuous-time signal z(t) that can be transmitted over 
the complex-baseband system. The transmitter operation is illustrated in 
Figure 2.13, where it is attached to the channel from Figure 2.10(b). 

Next, we will design a receiver that can extract the transmitted discrete- 
time signals by taking samples of the received signal. The main complication 
is that thermal noise is added to u(t) in the receiver hardware due to the 
random motion of free electrons caused by thermal agitation. We model the 
noise by a white circularly symmetric complex Gaussian random process w(t) 
with constant power spectral density No W/Hz for all (relevant) frequencies.* 
The Gaussian distribution can be motivated by the central limit theorem 
in Lemma 2.6 since the random motion of many electrons gives rise to 
approximately Gaussian randomness. By adding the noise to the channel 
output v(t) in (2.116), we obtain 


p(t) = v(t) + w(t) = (g * 2)(t) + w(t) 


(0.0) 


= $ alk] (g*p) (+ £) + w(t), (2.121) 


k=— o0 


3The sampling rate must be strictly larger than the Nyquist rate if a signal that contains 
the frequencies +B/2 should be identifiable after sampling. This can be seen from the fact that 
Nyquist sampling of a sine signal results in all samples being zero because they are taken every 
time the signal crosses zero. Practical communication signals are never perfectly bandlimited; 
thus, oversampling is often utilized to avoid aliasing and enable digital filtering that deals with 
the out-of-band signal components. These implementation details are beyond the scope of this 
book, where we consider ideal pulses and sampling at the Nyquist rate for conceptual simplicity. 
4A practical signal cannot have a constant power spectral density for all frequencies because 
then it will have infinite power. Hence, we assume that the power spectral density is constant 
for all relevant frequencies to consider in wireless communications but can drop to zero for other 
frequencies to keep the power finite (this happens in practice for extremely large frequencies). 
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Figure 2.12: The PAM signal z(t) defined in (2.120) uses time-shifted pulses, as illustrated in 


(a) for p(t) = v Bsinc(Bt). These pulses are multiplied by different symbol values and summed 
up to create z(t), as shown in (b). 
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Figure 2.13: The transmitter of a communication system generates a continuous-time signal 
z(t) from the discrete-time symbol sequence {z[l]}, using PAM. The receiver is undoing this 
operation by lowpass filtering (to suppress noise) and sampling. The channel in the middle is 
the same as in Figure 2.10(b). 


where the last equality follows from (2.120). The additive noise is spread over 
all frequencies, while the desired signal u(t) is bandlimited to |f| < B/2 by 
design. Hence, we can remove the out-of-band noise by lowpass filtering u(t) 
without affecting the desired signal. The sinc-pulse p(t) defined in (2.118) 
and (2.119) is an ideal lowpass filter that can be used for this purpose. We 
will filter u(t) by p(t) and take samples of the output at the same rate as the 
symbols are transmitted; that is, one sample every 1/B seconds. We denote 
the time instances of the samples as t = 1/B, where l is the integer sample 
index, and thereby obtain the sampled received signal 


yll] = (p * 4) (t) ae 
> k 
= 2 alk (p* 9 *p) (+ £) hope (p * w)( ees 
= >, alk (var) ( = ) +nia, (2.122) 


k=—oo 
where the discrete-time noise n[l] can be shown (see Exercise 2.5) to be 
complex Gaussian distributed and independent for different 1: 


nll] = (p * w) (t) i „~ Nc(0, No). (2.123) 


=1/ 
We have now derived the discrete-time system model (2.122) that determines 
how the sampled received signal y[/] depends on the input symbol sequence 
{x[k]}. Hence, we can abstract away the notationally complicated continuous- 
time description of the communication system and only consider discrete-time 
models in the remainder of this book. 


2.3.3 Basic Wireless Channel Modeling 


Wireless channels have a particular structure that we can utilize to simplify 
the system model: the received signal is a summation of several attenuated and 


5This operation is also necessary in practice to filter out interference from other wireless 
systems operating in neighboring frequency bands. 
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Figure 2.14: The passband channel model in (2.124) consists of L components with different 
attenuation a; and delay 7;. This figure illustrates how these components can be connected to 
different propagation paths. 


delayed copies of the transmitted signal (i.e., a superposition of echos). Suppose 
the received signal consists of L copies, each having an attenuation a; € [0,1] 
and a delay 7; > 0 seconds, for i = 1,..., L. The receiver synchronizes its 
clock to the transmitter by delaying it by 7 > 0 seconds to compensate for 
the propagation delays. The receiver will then observe a superposition of L 
signal copies that are delayed by 7; — 7 € R, fori =1,..., L. We can write 
the impulse response of the channel in the passband as 


L 
mE) =J ðt +n- n) (2.124) 


and it then follows from (2.117) that the equivalent impulse response in the 
complex baseband is 
L 
g(t) =X age Ptt +n Ti). (2.125) 
i=1 
Figure 2.14 illustrates how the L copies can be connected to different prop- 
agation paths in the environment. The delay of a path is closely related to 
the length of the corresponding path, while the attenuation is determined by 
the distance that the signal has traveled (as in free-space propagation) and 
which objects the signal has interacted with along the way. Note that the 
impulse response in the complex baseband contains additional phase-shifts 
that depend on the carrier frequency. 
The channel g(t) appears in (2.122) as the convolution (p*g*p)(t) sampled 
at time t = ‘>*. For the model in (2.125), this convolution term becomes 


L 
(p= g *p)(t) = | ase PTF (p x p)(t+ n — 71) (2.126) 
j=l 
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by utilizing the fact® that the convolution between an arbitrary function f(t) 
and the delayed Dirac delta function e~/?*4ct6(t—7) is equal to e~?*Fe7 f(t—r), 
where T = 7; — 7 and f(t) = (p p)(t) in this case. We further notice that 
(p * p)(t) = sinc( Bt) since the Fourier transform of (p * p)(t) is 


fs if |f| < B/2, 


F {(p* p)(t)} = POUP) = (2.127) 


0, if |f| > B/2, 
which coincides with the Fourier transform of sinc(Bt). By utilizing this 
property and (2.126), we can simplify (2.122) as 


co 


L 
w= > alk] X are? sinc((1 k) + B(n— ri)) + nfl]. (2.128) 


k=—0o w=1 


2.3.4 Discrete Memoryless Channel Model 


The received signal y[l] in (2.128) at time / depends on multiple transmitted 
symbols, as can be seen by the summation over k. Since the symbols were 
transmitted one after the other, the channel has created the intersymbol 
interference. This happens when the L paths in our channel model have 
widely different lengths/delays so that a symbol that reaches the receiver over 
a short path arrives at the same time as a previous symbol arrives over a 
longer path. Another way to view it is that the received signal y[l] is not 
only containing the latest transmitted symbol x{I] but also has a memory of 
previously transmitted symbols (and potentially future symbols due to the 
non-causal sinc-pulse). The memory effect is undesired and can be combatted 
in various ways. We will identify a condition for when the memory vanishes. 

If all the channel components have roughly the same delay, we can synchro- 
nize the receiver by selecting 7 such that B(n — 7;) ~ 0 for all i. To alleviate 
the channel memory, we want the following approximation to hold: 


1, ifl=k, 


(2.129) 
0, ifl £k. 


sinc((l — k) + B(n — %)) ~ sinc(l — k) = i 


Since we can always make this approximation tight by selecting a sufficiently 
small bandwidth B, this is known as the narrowband signal assumption. This 
result follows from two assumptions that we have made. First, p(t) was selected 
to be the pulse in the PAM since it satisfies the Nyquist criterion; that is, 
(p x p)(l/B) is zero for all integers | except l = 0. Second, the narrowband 
assumption implies that the channel will not tamper with the Nyquist criterion. 
We stress that the narrowband assumption is valid for large bandwidths in 
environments with tiny path delay differences (or only one path). 


6The convolution is computed as JZ, eiT fcu S (u — r) f(t — u)ðu = e?r SeT f(t — r) by 
using the sifting property of the Dirac delta function. 
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Figure 2.15: A discrete memoryless SISO channel with input 2[/] and output y[l] = h-x{l]+nll], 
where l is a discrete time index, h is the channel response, and nll] is the independent complex 
Gaussian receiver noise. 


By inserting (2.129) into (2.128), the system model simplifies to 


oo L 
NO alk] X age PF sine(1 — k) + nfi] 


k=—oo =1 
? 


h- al] + nfl] 


yil] 


(2.130) 


where | is a discrete-time index, and the channel is now represented by 


L 
h= > ae FOOD, (2.131) 
i=1 


From now on, we will refer to h € C as the channel response and note that 
GB = |h|? is the channel gain described in Chapter 1. In some parts of this 
book, we will utilize h as an arbitrary channel response, while there are other 
parts where we will utilize and generalize the specific structure in (2.131). 

Interestingly, we can represent the entire continuous-time communication 
system in Figure 2.13 by the simple equation (2.130). This is called the symbol- 
sampled discrete-time representation of the channel and will be used in the 
remainder of this book without loss of generality. A block diagram for this 
channel is given in Figure 2.15, where we also stress that this is a single-input 
single-output (SISO) channel with one input to the channel and one output. 

The type of channel in (2.130) is also known as a discrete memoryless 
channel since the received signal y[l] only depends on one transmitted signal 
xl] and one independent noise realization n[l]; there is no memory of previous 
time instances or impact from later time instances. For this reason, we can 
just as well drop the time index l and get the system model 


y=h-x+n. (2.132) 


When designing the input signal x, we often treat it as a random variable. We 
will let q denote the average signal energy per symbol (which is a measure of 
signal power), which implies E{|a|?} = q. The system in (2.132) is also known 
as an additive white Gaussian noise (AWGN) channel. 
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2.4 Performance Metrics 


This section explains how much information can be transmitted reliably over 
the discrete memoryless channel in (2.132). We consider the transmission of a 
finite-sized data packet that represents a particular piece of information (e.g., 
an image, a text document, a piece of a video, or a control command). In this 
book, we use the words information and data interchangeably because we 
consider transferring bits from a transmitter to a receiver, while abstracting 
away what those bits might represent. However, we stress that data generally 
refers to the raw sequence of bits considered within a communication system, 
while information is the application-level interpretation of these bits. 
A data packet is characterized by the following: 


e How many symbols the packet contains, which is the number of times 
we will transmit over the channel in (2.132); 


e How many data bits each of these symbols represent, determined by the 
modulation and coding scheme; 


e How large the probability of incorrect decoding is at the receiver. 


When transmitting a packet containing a small number of symbols, the proba- 
bility of incorrect decoding is a major concern. Hence, a common performance 
metric is the symbol error probability (also called the symbol error rate), which 
is the probability that an arbitrary symbol [I] is decoded incorrectly. This 
metric has many variations, such as the bit error probability and packet error 
probability. The values of these error probabilities depend on the choice of the 
modulation and coding scheme, and the SNR of the channel. In each case, one 
can derive exact or approximate error probability expressions, which often 
contain the Gaussian Q-function due to the Gaussian noise. 

Letting each symbol describe many data bits is desirable, but the error 
probability increases when more bits are represented. Hence, there is a tradeoff 
between low error probability and many data bits per symbol. This tradeoff 
is non-trivial when transmitting packets with a small number of symbols. It 
typically boils down to selecting a non-zero target error probability based 
on experiments (e.g., 0.01) and then selecting the “best” modulation and 
coding scheme that satisfies that target from a predefined list of schemes. 
When an error occurs, we need to retransmit the packet. This tradeoff is 
illustrated in Figure 2.16(a), where there are few errors when transmitting a 
few bit/symbol and many errors when transmitting many bit/symbol. The 
gradual color change shows how the error probability increases gradually. 

In contrast, when transmitting a packet with many symbols, the error 
probability can be made negligible by selecting the proper modulation and 
coding scheme, which renders the error metric superfluous. The “right” scheme 
should operate close to, but below, the channel capacity. Claude Shannon 


96 Theoretical Foundations 


bit /symbol 


© 


(a) Small packet. 


C bit /symbol 


© 


(b) Large packet. 


Figure 2.16: The packet error probability increases the more bits are transmitted per symbol. 
When the packet is small, then there is a gradual transition between a few and many errors, 
as shown in (a). However, when the packet is large, the transition is concentrated in a small 
interval around the channel capacity C. 


defined the capacity in the seminal paper [40] from 1948, therefore, it is also 
known as the Shannon capacity. Figure 2.16(b) illustrates the essence of this 
result, namely that the transition between having few and many errors in the 
transmission happens in a small interval around a value C bit/symbol called 
the capacity when the packet is large. It can be formally defined as follows. 


Definition 2.6. The channel capacity of a given channel is the highest number 
of bits per symbol that can be communicated with arbitrarily low error 
probability as the number of symbols in the packet approaches infinity. 


The interpretation of the channel capacity is that we can communicate 
without error when sending long sequences of symbols, if we carefully select 
how many bits each symbol represents. This implies that the gradual colored 
transition interval shown in Figure 2.16(b) vanishes asymptotically so that 
we get a sudden shift between no errors when operating below the capacity C 
and many errors when operating above the capacity. In this context, “long” 
means (at least) 10000 symbols [41], which takes 1ms to transmit when 
using B = 10 MHz. This is relatively short in practice; thus, many wireless 
communication systems operate very close to the capacity. Since one of the 
core motivating factors of multiple antenna communications is to transmit a 
large amount of data in a way that is faster and/or requires less power than 
in single-antenna communications, it is natural to adopt the channel capacity 
as the performance metric in this book. That said, methods to achieve high 
capacity with multiple antennas coincide, to a large extent, with methods 
that provide low error probabilities when transmitting small data packets. 

The unknown randomness of the noise must be combatted to achieve 
reliable (error-free) communications. When sending long sequences of sym- 
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bols, many independent realizations of the noise will be observed, and the 
uncertainty can be averaged out if we code the data in the right way. The 
channel capacity determines how much data can be coded into the sequence 
of symbols while enabling the noise effect to average out. 


2.4.1 Basic Capacity Results 


Since the capacity represents the highest number of bits per symbol that can 
be communicated without errors, any “bit per symbol” value between zero 
and the capacity can also be used without errors. Each such number is called 
an achievable data rate, an achievable rate, or a rate. 


Definition 2.7. An achievable data rate is a positive number below the channel 
capacity. It is possible to communicate at this rate with arbitrarily low error 
probability as the number of symbols in the packet approaches infinity. 


Although the capacity is of primary interest, there are situations where 
the capacity is unknown. Therefore, it is crucial to find achievable data rates 
that can be used to communicate without error. 

The channel capacity can be rigorously defined for any communication 
channel, but we refer to [40] and [42] for the general details. This book only 
considers the general concept of discrete memoryless channels. For such a 
channel that takes the data symbol x as input and produces y as output, the 
channel capacity takes the following form as proved in [38], [40], [42]. 


Theorem 2.1. Consider a discrete memoryless channel with input x € C and 
output y € C, which are two random variables specified by the conditional 
PDF fyjx(y|2). The channel capacity is 


C = max (H(y) - H(ule)). (2.133) 


where the maximum is taken with respect to all distributions fx(x) of the 
input that are considered feasible. The differential entropy H(y) is defined as 


Hy) =~ f 1o83 (KO) (wey (2.134) 


using the marginal distribution fy(y) = f ec fyx(ulz)fx(x)ðx of y and the 
conditional differential entropy H(y|x) is defined as 


H(yle) = = ies logy (fyix(ul)) fyix(ult)fe(a)Ovdy. (2.135) 


We note that all the integrals in Theorem 2.1 are computed over the entire 
complex plane, which is the same as considering a double integral where both 
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Figure 2.17: The left circle (green) represents the random variable x and the right circle (red) 
represents the random variable y. The circle areas match the respective differential entropies 
H(x) and H(y), while the intersection is the mutual information Z(x; y) that can be computed 
in the two ways described in (2.137). The capacity is the maximum mutual information; that is, 
the information that is contained in both the transmitted signal x and the received signal y. 


the real and imaginary parts are integrated from —co to +00. The capacity 
in (2.133) is given by the difference between two terms: H(y) — H(y|x). The 
differential entropy H(y) measures our surprisal when observing a realization 
of the random variable y at the receiver, which also measures the amount 
of unknown information that the variable conveys. The differential entropy 
can take any value from —oo to +00, where a larger value implies a larger 
surprisal. Similarly, H(y|x) measures the amount of additional information 
we obtain by observing y if we already know zx. It holds that H(y) > H(y|z) 
since observing x cannot increase our surprisal when we later observe y, but 
it can usually reduce the surprisal substantially. Hence, H(y) — H(y|x) > 0 
and the channel capacity must be greater than or equal to zero. 

More generally, the differential entropy of a sequence 71,...,2, of random 
variables can be expressed using the following chain rule: 


L 
Figg) = WG ip fd), (2.136) 
t=1 


Since the conditioning cannot increase the surprisal, the /th term in the sum 
can be upper bounded by H(z). It follows that H(21,...,01) < TL, Hla), 
where equality is achieved if and only if the random variables are independent. 

Figure 2.17 shows a Venn diagram where the circles represent the random 
variables x and y, and their areas equal the respective differential entropies 
H(x) and H(y). The intersection between the circles determines the ability to 
extract information about the transmitted signal x from observing the received 
signal y. The area of the intersection is H(y) — H(y|x). If we select the input 
distribution f,(x) to maximize this area, then it equals the channel capacity C 
in Theorem 2.1. There is an important statistical symmetry in this figure, which 
implies that the intersection area can also be expressed as H(x) —H(z|y). This 
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expression has an intuitive interpretation: The entropy/uncertainty about the 
transmitted signal x minus the entropy/uncertainty remaining after observing 
the received signal y. The difference is the knowledge that we learned from our 
observation. It is called the mutual information since it measures the common 
information contained in the random variables, and we can denote it as 


L(x; y) = H(y) — H (yle) 
= H(x) — H(aly). (2.137) 


The capacity is the maximum mutual information that can be achieved. 


Example 2.14. What is the channel capacity if z and y are independent? 

In this case, the conditional PDF that determines the capacity reduces to 
fyix(ylx) = fy(y), which is the marginal PDF of the output y. The conditional 
differential entropy in (2.135) can now be computed as 


Hila =— ff 108 () w)fala)020y 


2o / logs (fy(y)) fylu)Oy i fele)Ox=H(y). (2-138) 
yet zEC 


The capacity in (2.133) becomes zero in this case since H(y) = H(y|x), so 
there is no intersection between the circles in the Venn diagram in Figure 2.17. 
Consequently, the ability to transfer information lies in the correlation between 
the random variables at the input and output of the channel. 


To compute the capacity in (2.133), we need to identify the PDF f,(z) 
of the input x that maximizes H(y) — H(y|x). This is the same as finding 
an optimal modulation and coding scheme. Theorem 2.1 says we can only 
select distributions that are “considered feasible”, so we must specify some 
requirements on «x. It is common to consider all distributions for which 
the symbol power 
maximum power. To find the optimal PDF, we need the following key result 
that says which distribution maximizes our surprisal [42], [1, Lemma B.20]. 


Lemma 2.9. For any continuous random variable z € C with E =p 
the differential entropy of z is upper bounded as 
H(z) < loga(erp), (2.139) 


where e ~ 2.71828 is Euler’s number. Equality is achieved in (2.139) if and 
only if z ~ Nc(0, p); that is, the complex Gaussian distribution has the largest 
possible differential entropy. 
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For the kind of discrete memoryless channel in (2.132) and Figure 2.15, 
we have y = ha +n, where h is a deterministic scalar, n ~ Nc(0, No), and 
the signal x has the symbol power E{|z|?} = q. Hence, the feasible input 
distributions are all f,(x) that satisfy E{|z|?} = q. The choice of input 
distribution only affects H(y) because x is known in H(y|x), thus we want to 
select the distribution of x to maximize H(y). Since the signal and noise are 
independent, we obtain 


E{lyl?} = Ef|x|?}|A|? + E{|n|?} = glh|? + No. (2.140) 
We can utilize the result in (2.139) to conclude that 
H(y) < logs (er(q|h|? + No)) (2.141) 


with equality if and only if y ~ Nc(0,q|h|? + No). This maximum entropy 
is achieved when x ~ Ne(0,q); thus, we have found the input distribution 
corresponding to the maximum in the capacity expression in (2.133). This is 
called the capacity-achieving input distribution. 

To obtain a closed-form capacity expression, it remains to compute H(y|z). 
When z is known, the only randomness that remains in y = hx + n is that of 
the noise n ~ Nc(0, No) since h is deterministic, thus 

H(y|xz) = H(n) = log, (er No), (2.142) 
where the last equality follows from Lemma 2.9. As a final step, we notice 
that 

C i ] 2 a q|h|? 
= log, (em(q|h|" + No)) — logs (er No) = loga | 1 + Wwe (2.143) 
0 


We can summarize the capacity of an AWGN channel as follows. 


Corollary 2.1. Consider the discrete memoryless channel in Figure 2.15 with 
input x € C and output y € C given by 


y = [ae (2.144) 


where n ~ Nc(0, No) is independent noise. Suppose the input distribution 
is feasible whenever the symbol power satisfies E{|x|?} < q and h € Cis a 
constant known at the output. The channel capacity is 


h 2 
C = logs (1 + = bit /symbol (2.145) 
0 
and is achieved when the input is distributed as x ~ Nc(0, q). 


The channel capacity in (2.145) is expressed in bit per symbol, but many 
equivalent units appear in the communication literature: bit per sample, bit 
per channel use, bit/s/Hz, and bit per complex degree of freedom. 
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The complex Gaussian input distribution creates a continuous signal 
constellation, where the transmitted signal x can take any value in C. We 
will transmit packets containing N symbols to showcase how a capacity- 
achieving system operates. For a capacity of C bit/symbol, we can convey 
NC data bits per packet. Hence, 20C different potential data sequences 
can be communicated. We then need to create a codebook containing 2° 
different symbol sequences, and each is called a codeword and represents one 
of the 2N° data sequences. When we want to transfer a packet containing 
specific data, we transmit the corresponding codeword from the codebook. 
The receiver’s task is determining which of the 2° codewords was most likely 
to have been transmitted. With the capacity-achieving complex Gaussian 
input distribution, each codeword is generated by taking N independent 
and identically distributed (i.i.d.) realizations from Nc(0,q). This is called a 
Gaussian codebook. The codebook generation is done once and for all when 
designing the communication system. The codewords must be stored in the 
transmitter to enable encoding (i.e., transmitting the correct codeword) and 
in the receiver to facilitate decoding (i.e., identifying which codeword was 
sent). More precise details can be found in [42, Ch. 10]. Since the channel 
capacity is achieved as the packet length N — oo, this communication method 
is impractical since the complexity of finding the correct codeword and the 
storage requirements for the codewords grow exponentially with N. 

In practice, the capacity-achieving system operation is approximated by 
imposing a structure that alleviates the need for storing the codewords and 
simplifies the encoding/decoding. It is common to utilize a discrete signal 
constellation where each symbol x can only take values on a square grid 
containing 2© points, where C is the closest even integer above C. This is 
called quadrature amplitude modulation (QAM). To not attempt transferring 
more data than the capacity allows, only a subset of 2NC symbol sequences 
among the 2“ possible sequences is utilized, where the ratio C/C is called 
the coding rate. The subset is selected by a channel coding scheme designed 
to minimize the risk of mixing up the selected sequences at the receiver side 
(i.e., minimizing the probability of decoding error). 

To give a concrete example, the 5G NR standard utilizes the modulation 
formats 4-QAM, 16-QAM, 64-QAM, and 256-QAM along with the low-density 
parity-check (LDPC) coding scheme, where the coding is designed to operate 
close to the capacity while enabling efficient encoding and decoding.” Fig- 
ure 2.18 exemplifies 28 predefined combinations of modulation and coding 
schemes (MCSs) from [43, Table 5.1.3.1-2], where the first column is an index 
that the transmitter and receiver can use when agreeing upon which combi- 
nation to utilize. The second column describes the modulation format, the 
third column is the coding rate, and the fourth column is the number of bits 
per symbol. If the channel capacity would be 4 bit/symbol, then we should 


7Polar codes are also used in 5G NR but for transmission of small blocks. 
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Index | Modulation format | Coding rate | bit /symbol 
0 4-QAM 0.12 0.24 
1 4-QAM 0.19 0.38 
2 4-QAM 0.30 0.60 
3 4-QAM 0.44 0.88 
4 4-QAM 0.59 1.18 
5 16-QAM 0.37 1.48 
6 16-QAM 0.42 1.70 
T 16-QAM 0.48 1.91 
8 16-QAM 0.54 2.16 
9 16-QAM 0.60 2.41 
10 16-QAM 0.64 2.57 
11 64-QAM 0.46 2.73 
12 64-QAM 0.50 3.03 
13 64-QAM 0.55 3.32 
14 64-QAM 0.60 3.61 
15 64-QAM 0.65 3.90 
16 64-QAM 0.70 4.21 
17 64-QAM 0.75 4.52 
18 64-QAM 0.80 4.82 
19 64-QAM 0.86 5.12 

20 256-QAM 0.67 5.33 
21 256-QAM 0.69 5.55 
22 256-QAM 0.74 5.89 
23 256-QAM 0.78 6.23 
24 256-QAM 0.82 6.57 
25 256-QAM 0.86 6.91 
26 256-QAM 0.90 7.16 
27 256-QAM 0.93 7.41 


Figure 2.18: The list of 28 MCS combinations utilized in the 5G NR standard. The list is 
adapted from [43, Table 5.1.3.1-2]. 
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search the table for the closest but smaller number, which in this case is index 
15 that provides 3.9 bit /symbol. Hence, 64-QAM should be used to transmit 
log,(64) = 6 codeword bits per symbol, whereof a fraction 0.65 contains data 
bits, resulting in 6 - 0.65 = 3.9 bit/symbol. We will not consider any of these 
specific details in the remainder of this book but utilize the channel capacity 
as the performance metric while keeping in mind that there are practical ways 
to communicate at data rates close to the capacity. 

We can rewrite the capacity expression in (2.145) taking the following 
three facts into account: 


1. B symbols are transmitted per second; 
2. The channel gain is 8 = |h|?; 


3. The symbol power q is measured in energy per symbol. It can be expressed 
as q = P/B, where P is the transmit power in Watt and B is the number 
of symbols per second. 


The first fact means we can multiply (2.145) with B to change the unit from 
bit /symbol to bit/s. This is why the unit bit/symbol is also equivalent to the 
unit bit/s/Hz. The latter two facts can be used to make changes of variables, 
leading to 

Pp 


+t BM 
We notice that the channel capacity is given by the bandwidth multiplied by 
the base-two logarithm of one plus 


C = Blogs (1 ) bit/s. (2.146) 


PB 


N = 
SNR BNo 


(2.147) 


that was previously stated in (1.13). Hence, the channel capacity is tightly 
connected to the SNR, just as many other communication performance metrics. 


2.5 Estimation Theory 


The goal of estimation is to compute a good approximate value of an unknown 
parameter based on measurements. The estimation procedure is particularly 
challenging when the measurements are limited and noisy. There are two 
main subfields of estimation theory [44]. In classical estimation, the unknown 
variable is deterministic and, thus, has the same constant value forever. In 
Bayesian estimation, the unknown variable is instead a realization of a random 
variable with a known statistical distribution (also known as the prior). 

In wireless communications, the transmission of very large data packets is 
implicitly assumed whenever the channel capacity is used as the performance 
metric. Hence, unknown variables that are constant throughout the transmis- 
sion are relatively easy to estimate; for example, a negligibly small preamble 
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can be attached to the packet to obtain the necessary measurements. In 
contrast, unknown variables that take different values during the transmission 
must be estimated using few measurements because there is insufficient time 
to make extensive measurements. This can be modeled as if the unknown 
variable takes different realizations from the same random variable at different 
times. For this reason, Bayesian estimation will mostly be considered in this 
book. It is generally assumed that the statistics are known, but Section 2.6 
describes how they can be estimated in practice. 

The general principle is that we want to compute an estimate of a realization 
h of arandom variable. The available information is an observation y connected 
statistically with the unknown variable. More precisely, we have measured the 
current value of y and know the conditional PDF fhjy(h|y) of h given the value 
of y. There is a rich theory for Bayesian estimation of both real and complex 
variables and different ways of measuring what is a good approximate value 
[44]. We will only consider the mean-squared error (MSE) as the performance 
metric for the estimation. 


Definition 2.8. Consider a random variable h € C and let h(y) denote an 
arbitrary estimator of h based on the observation y € C. The estimation error 
is h — h(y) and the MSE is defined as 


MSE, =E{|h—h(y)?}, (2.148) 


by taking the average squared estimation error. 


Lemma 2.10. The estimator that minimizes the MSE in (2.148) is called the 
minimum mean-squared error (MMSE) estimator. It can be computed as 


hnssase(v) = Bthly} = | hf lhly)ðh (2.149) 


where fhjy(hly) is the conditional PDF of h given the observation y. 


The MMSE estimator is the conditional mean of h given y. By definition, 
it minimizes the variance of the estimation error. Since the estimator depends 
on the conditional PDF fh\y(hly), it will be different depending on how h is 
distributed. The integral in (2.149) cannot be computed analytically in general, 
so it must be evaluated numerically. The Gaussian case is an exception. 


2.5.1 MMSE Estimation of Complex Gaussian Variables 


We are particularly interested in the memoryless channel model in (2.132), 
which we restate as 


y=h-xt+n, n ~ Nc(0, No). (2.150) 
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Suppose the channel response h is unknown and should be estimated. We can 
then select the transmitted signal x as a deterministic number known at both 
the transmitter and receiver, so that only h is unknown in the product h» x 
n (2.150). We also know the distribution of the additive complex Gaussian 
noise n, but the realization is unknown. The goal is to compute the MMSE 
estimate of h based on the observation y obtained at the receiver. 

We consider the case when the channel is complex Gaussian distributed: 


h ~ Ne(0, 8). (2.151) 


To compute the MMSE estimator in (2.149), we must first determine the 
conditional PDF fhy(hly). This problem resembles the one considered in 
Section 2.2.3. If we divide all terms in (2.150) by x, we obtain 


1 1 
“y= h +n, (2.152) 
To ay Š 


which is of the same form as (2.69) but with o2 = 6 and o2, = No/|x|?. Hence, 
we can utilize (2.74) to obtain 


8 N: e+e B y i 2 lx|24N, Bau* 2 
th (hly)= Fe ei Pte _ Blel’+No , t No BNo =e Blel2+No 4 
ly Bia TBNo 
(2.153) 


The MMSE estimate is the mean value of this conditional PDF. By comparing 
(2.153) with the PDF of a complex Gaussian distribution, we notice that 


pa“ ( BNo ) 
en le yO a (ee e 2.154 
Ble? + No! ~AS (O Bia + Ne D 
when y is known. Hence, the conditional mean value is E{h|y} = Ae z SEY: 


The variance AUTEN in (2.154) is the MSE of the estimate. 


Lemma 2.11. Consider the estimation of h ~ Nc(0, 3) from the observation 
y = h-x+n, when the signal x € C is known and n ~ Nc(0, No) is independent 
noise. The MMSE estimator of h is 


R o 
h =- 2.155 
MMSE (Y) blæ]? I No” ( ) 
The corresponding minimum MSE is 
N 
MSE, = BNo (2.156) 


B\x|? + No 
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Among all possible estimators that utilize the observation y and the channel 
statistics, the MMSE estimator minimizes the MSE. The MMSE estimate in 
(2.155) will be expressed as h later in this book without explicitly specifying 
what observation it is based on and which type of estimate it is. 

We notice that this MMSE estimate is a linear function ay of the observa- 
tion y, which is scaled by the factor a = AEN to obtain the estimate that 
is closest to the true value of h in the MSE sense. For this reason, the estima- 
tor in Lemma 2.11 is sometimes referred to as the linear MMSE (LMMSE) 
estimator; that is, the estimator that obtains the lowest MSE among all linear 
estimators. While it is formally correct to use that terminology, the naming 
devalues its properties by giving the wrong impression that there might exist 
better estimators that are non-linear functions of y. Hence, in the remainder 
of this book, we will call (2.155) the MMSE estimator. 

A useful benefit of the expression in (2.155) is that we can directly generate 
random realizations of h without first generating realizations of y, h, and n. 
Since y ~ Nc(0, B|z|? + No), it follows that 


R Bx* 2 5 7 B?|x/? 
h ~ Ne (0 Blak? + No (Blz| 2 =Ne (or) 
= Ne (0,8 — MSEn). (2.157) 


Moreover, the estimation error h = h — his distributed as 
h ~ Nc(0, MSE») (2.158) 


with the MSE in (2.156) being the variance since 


Var {A} = E{|h?} =E {|h — Al? } = MSE». (2.159) 


The estimate and estimation error are statistically independent, which can be 
seen from the fact that they are complex Gaussian distributed and uncorrelated. 
Their variances add up to that of the original unknown variable h: Var{h} + 
Var{h} = 6 — MSE, + MSE» = £. This showcases how the MMSE estimator 
extracts all useful information from the observation y so that the error term 
only contains information that was not observed. Consequently, the estimation 
error is also statistically independent of the observed signal y. 

Intuitively, the estimation quality should be better when the factor ha in 
(2.150) is much larger than the noise term when comparing their magnitudes. 
If we let |z| — oo, it follows that the MSE in (2.156) goes to zero and that 
the estimate’s variance in (2.157) approaches 3. This means we can estimate 
the channel without error when the SNR is large. 

The MSE in (2.156) is an increasing function of 8, so we should expect 
larger estimation errors when estimating a variable with a large variance 
compared to a small variance. However, it is the relative size of the estimation 
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error that matters in many contexts, and it is quantified by the normalized 
MSE (NMSE) that is computed as 


E(k- A?) sp, No 
efa} Ble? + No 


NMSE;, = (2.160) 


The NMSE is a decreasing function of 8, so it is easier to estimate a variable 

with a large variance than a small one as it stands out more from the noise. 
We have now described how to estimate the channel coefficient h. The 

next example shows how to estimate signals using the MMSE estimator. 


Example 2.15. Suppose we want to estimate the data signal x ~ Nc(0,q) 
from the received signal 
f= Mn, (2.161) 


where h € C is a known constant channel and n ~ Nc(0, No) is independent 
noise. What is the MSE if the MMSE estimator is used? Use the MSE 
expression to compute the mutual information T(z; y). 

The MMSE estimation problem is the same as in Lemma 2.11, except that 
x and h have interchanged the roles of being known and unknown. We can 
denote the MMSE estimate as ĉ. By making the variable substitutions 8 > q 
and x — h in (2.156), the MSE when estimating x becomes 


qNo 


MSE, = ——=——. 
q\h|? + No 


(2.162) 
The error is independent of ĉ and distributed as % = x — ĉ ~ Nc(0, MSE,). 

The mutual information in (2.137) is equal to H(x)—H(a2|y) and Lemma 2.9 
states that H(z) = loga (erq) since the signal is complex Gaussian distributed 
with variance q. It further holds that 


H(aly) = H(z — &ly) = H(Zly) = logy(exMSE,), (2.163) 


where the first equality follows from subtracting the MMSE estimate from 
x, which can be done without changing the entropy since y is known. The 
last equality follows from noticing that the estimation error is independent 
of y and complex Gaussian distributed with variance MSE,. The mutual 
information can finally be computed as 


H(x) — H(2ly) = log, (eq) — logs(emMSE,) 


h 2 
= logs (ae) = log, (1 $ i ) : (2.164) 


This is an alternative way of computing the capacity in (2.145). 
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2.5.2 LMMSE Estimation of Arbitrarily Distributed Variables 


We will now consider LMMSE estimation when the received signal is y = 
h-x+n as before, but the unknown variable h and the noise n might not 
be Gaussian distributed. An LMMSE estimator has the form h = ay, where 
a is selected to minimize the MSE. The MSE is a function of a and can be 
minimized by equating the first-order derivative to zero:® 


= sat {IAP} = M E{(h — ay)(h — ay)*} = -E{ħy*}. (2.165) 


This sufficient and necessary condition for selecting a is called the orthogonality 
principle: E{hy*} = 0. The interpretation is that the scaling factor a must 
be designed so that the error term h = h — h is uncorrelated with the 
received signal y; that is, there is no useful information left that can be 
extracted using linear methods. It follows from the orthogonality principle 
that E{hh*} = E{hy*}a* = 0, which implies that the estimate and estimation 
error are uncorrelated random variables. In the special case where the estimate 
and estimation error are complex Gaussian distributed (which happens when 
h and n are Gaussian, as in the last section), it follows from Lemma 2.7 that 
the uncorrelated variables h and h are also independent random variables. In 
the general non-Gaussian case, the estimate and error are only uncorrelated. 

The orthogonality principle can be used to find the LMMSE estimator, 
which we will show through an example. 


Example 2.16. Use the orthogonality principle to derive the LMMSE estima- 
tor of h given the received signal y = h- x +n. Assume that E{h} = E{n} = 
Eine } and E 

An arbitrary linear estimator has the form h = ay. We need to find the 
value of a that satisfies the orthogonality principle E{hy*} = 0: 


0 =E {ħy*} =E{(h— ay) y*} = E {hy*} — aE {ly}. (2.166) 
By solving for a in (2.166), we obtain 
a Ew) _ Ef{h(hatn)"} _ Eja at +E{hn*} Bart 
E{|re+nPb E(P} e+e} Ple + No 
(2.167) 


by utilizing that h and n are uncorrelated. In summary, the LMMSE estimator 
is h = ay with a given in (2.167). It coincides with the MMSE estimator in 
(2.155) for complex Gaussian variables with the specified variances. 


a 


8Since a is a complex-valued parameter, we compute the Wirtinger derivative Ja” 


3 (aay tjaa ), which includes the derivatives with respect to R(a) and S(a). 
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The derivation of the LMMSE estimator only used the mean, variance, and 
covariance of h and n. This implies that the LMMSE estimator is the same 
irrespectively of the exact distribution of h and n, as long as the mean and 
(co)variance are as specified. On the other hand, the general MMSE estimator 
utilizes the complete statistical distributions and will not only change in the 
non-Gaussian case but likely be harder to derive analytically. The equivalence 
between the MMSE and LMMSE estimators only holds in the Gaussian case, 
so the MMSE estimator must give a strictly smaller MSE in non-Gaussian 
cases. This implies that estimating Gaussian variables that are observed in 
Gaussian noise is the hardest situation, which is aligned with the fact that 
the Gaussian distribution maximizes the differential entropy. 

We have established the following result regarding the LMMSE estimator 
when h is not necessarily Gaussian distributed. 


Lemma 2.12. Consider the estimation of h from the observation y = h-x+n, 
when the signal x € C is known and n is noise with zero mean and variance 
No. Suppose the variable h has zero mean, variance 8, and is uncorrelated 
with the noise (i.e., E{hn*} = 0). The LMMSE estimator of h is 


pa” 


h =- y: 2.168 
LMMSE (Y) Bjz|? EN ( ) 
The corresponding minimum MSE is 
BNo 
MSE, = ~=—.———.. 2.169 
"Ble? + No w 


2.6 Monte Carlo Methods for Statistical Inference 


The previous section described how to estimate the realization of a random 
variable from noisy observations. An underlying assumption was that the 
statistics are known, but, in practice, we must also have a mechanism to 
acquire the statistics. In this section, we will describe how the statistical 
properties of functions of random variables can be inferred. The statistics 
might determine the performance of a communication system or an estimator. 
There are many categories of methods that can be utilized for this purpose. We 
will consider Monte Carlo methods that use random samples of the underlying 
variables and process them to infer the unknown deterministic quantities. We 
will estimate the mean value of a function of random variables, estimate the 
error probability of a system that performs a task either resulting in success 
or error, and estimate the CDF of a random variable. Particular attention 
will be given to quantifying the estimation precision, which is essential when 
drawing conclusions based on the outcome of statistical inference. 
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2.6.1 Estimating the Mean Value 


Consider a real-valued random variable x with the PDF f(x) and mean value 
denoted as u. We recall from (2.56) that the mean value is defined as 


u= E{r} = T z f(x)ðzx. (2.170) 


There are many situations where this integral cannot be computed analytically, 
and then we have to resort to numerical methods for computing an approximate 
value of u. One example is the Monte Carlo method that takes L independent 
samples z1,..., £; from the random distribution and uses them to estimate 
u. Two properties are essential when designing the estimator: accuracy and 
precision. An estimator Îgzr, is accurate if its mean is equal to the value to be 
estimated (i.e., E{fiz } = E{x} = u) and it is precise if its variance Var{ Âz } is 
small. The sample average is an accurate (also known as unbiased) estimator 
of E{x} and is computed as 


1 L 
Ar= >>> ti, (2.171) 


where the subscript denotes the number of samples. We only need a way to 
generate independent samples to compute this estimate, while the PDF can 
be unknown. The motivation behind using the sample average in (2.171) is 
the law of large numbers in Lemma 2.4, which says that the sample average 
approaches the statistical mean when the number of samples L goes to infinity: 


fr > E{r} as Lo. (2.172) 


The only required condition for the convergence is that the variance Var{«} 
of the random variable must be finite. To see the reason for that, we can 
compute the variance of the sample average as 


1 a dt see Var{x} 
Var {fir} = ga var Xoti = X Var {z;} = L` (2.173) 
i=1 


i=1 


where the second equality utilizes the fact that the samples are independent. 
The variance in (2.173) reduces proportionally to 1/Z when the number of 
samples increases, starting from the original variance value. Hence, as long 
as the original value is finite, the variance of the sample average goes to zero 
as L —+ oo. Furthermore, the standard deviation is the square root of the 
variance and becomes \/Var{x}/L, which goes to zero proportionally to 1/VL 
when increasing the number of samples. 

Depending on the application, the number of samples, L, should be selected 
to achieve an estimate with the desired precision. Since the Monte Carlo 
method uses random samples, we can only guarantee the precision in a 
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probabilistic sense; that is, we can make sure that the estimation error |/iz — | 
is smaller than some specified error tolerance ô > 0 with the (high) probability 
1—e, where e > 0 is the (small) probability that the requirement is unsatisfied. 
In other words, we want to find L such that 


Pr{|ir— u| < ô} = Pr{u— ô < fr sur} 
=Pr{fizr-—O0< pws firt+o}>l1-—e, (2.174) 
TMM 


Confidence interval 


where the third expression is known as a confidence interval with confidence 
level 1 — e. It says that a fraction 1 — e€ of all realizations of the estimator ji; 
are so close to the true value u that it lies between fi, — 6 and fiz + ô. It is 
common to begin by specifying € to reach a desired confidence level and then 
either determine how large ô becomes in a given experimental setup (i.e., for a 
given L) or design the experiment (i.e., select L) to reach a desired value of 6. 

We can utilize Chebyshev’s inequality from Lemma 2.5 to derive an upper 
bound on how many samples are needed to satisfy (2.174) for given € and ô. 
However, the result will be overly conservative since it considers the worst-case 
random distribution. Since we consider the summation of L independent and 
identically distributed realizations, the central limit theorem implies that fiz 
is approximately Gaussian distributed, as previously stated in (2.65). Hence, 
we can utilize that distribution when characterizing the required number 
of samples. Recall from (2.66) that 95% of all realizations are within two 
standard deviations from the mean value. If we set € = 0.05 and want to 
guarantee an estimation error smaller than 6, then we need 


Var{x} ee ok, aS 4Var{z} 


i L 7 6? 


(2.175) 


For example, if Var{x} = 1 and we want a precision of ô = 0.1, then at least 
L = 400 samples are required to satisfy that requirement with 95% probability. 
The variance might also be unknown, in which case an approximation of it 
can be utilized when determining the number of samples. 

Figure 2.19 exemplifies how the Monte Carlo method can be utilized to 
estimate the mean value u = 1 of x ~ Exp(1), which has an exponential 
distribution. The number of samples, L, is shown on the horizontal axis, and 
the vertical axis shows potential estimates of u. Figure 2.19(a) and (b) show 
how the value of jiz progresses in two different experiments where we add 
more and more samples to the estimator. The shaded area between the dashed 
lines shows the (approximate) confidence interval around ji, where u exists 
with 95% probability. It is computed using the Gaussian approximation. The 
width of this interval reduces as 1//L when L increases because the width is 
proportional to the standard deviation. In both experiments, the estimator 
fluctuates, but the general trend is that more samples lead to a better estimate 
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Figure 2.19: Example of estimation of the mean u = 1 of a random variable with exponential 
distribution using the Monte Carlo method. The value of ñz is shown as a function of L in two 
different experiments. The 95% confidence interval is indicated, as well as the true value. 
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of u. Nevertheless, Experiment 2 shows that even with L = 1000 samples, the 
exact u might be outside the confidence interval. 

Instead of resorting to taking random samples, as in the Monte Carlo 
method, one can approximate the integral in (2.170) in a deterministic manner 
by approximating the integrand xf,(xz) as a piecewise constant function. This 
is called a Riemann sum and the approximation error can then be bounded 
in a non-probabilistic manner, but it only works if the PDF f,(z) is known. 
In contrast, the Monte Carlo method is convenient in practical situations 
where the PDF is unknown. For example, suppose the random variable x 
is obtained as a function of some multi-variate random variable y; that is, 
x = a(y) where a(-) can be any deterministic function. In this case, the PDF 
of x might be hard to characterize, even if the PDF of y is known. The Monte 
Carlo method can even be utilized when y has an unknown PDF, as long 
as samples from this random variable can be obtained from measurements. 
In wireless communications, y might be the randomness occurring in the 
propagation environment, while a(-) could be a complicated function that 
determines the communication performance. 

Under these circumstances, we can still obtain an approximation of the 
mean value by following the following procedure: 


1. Determine the required number of samples L; 
2. Draw L independent samples y1,...,yz of the random variable y; 


3. Compute the L corresponding samples of the random variable x, denoted 
as x; = aly;) for i = 1,..., L; 


4. Compute the sample average fi, = + So x; to estimate u = E{z}. 


The samples must be generated independently and from the same distribu- 
tion. Otherwise, the sample average might not converge to the correct number 
or not converge at all as L increases. These conditions put constraints on the 
methodology used when gathering the samples. One should, for example, be 
careful when merging measurements taken at different points in time, with 
different equipment, or at different locations. Computer simulations are robust 
against some of these effects but can nevertheless be affected by correlation in 
the (pseudo)random number generator (e.g., if multiple computers generate 
samples using the same random seed), limited arithmetic precision, other 
processes running in the same hardware, etc. 

If the same L samples are utilized to estimate multiple quantities, then 
their respective estimation errors will be correlated, leading to undiscoverable 
systematic errors. As an example, suppose we want to use the Monte Carlo 
method to compute the MSE AUN in (2.156) of the MMSE estimator 


for a range of different signal strengths ||”. This might be the only way of 
determining the MSE in situations where it cannot be computed analytically. 
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Figure 2.20: Example of estimation of MSE in (2.156) with 8 = No = 1 using the Monte Carlo 
method with L = 100 samples. Independent samples must be utilized when estimating different 
points on the curve, otherwise, systematic errors occur. 


According to (2.159), the MSE is equal to E{|h — h|2}, where h is the desired 
variable and h is its MMSE estimate. To compute this mean using the Monte 
Carlo method, we should generate L independent realizations of |h — h|? and 
compute the sample mean. For any non-zero value of |x|?, we can generate L 
independent realizations of h and the noise n, then compute the observation 
y = ha +n, and finally compute |h — h|? using (2.155). 

Figure 2.20 shows the exact MSE and estimated MSE for 8 = No = 1 and 
varying signal strength |2|?. Since there are many points on the estimated 
curves, we can implement the Monte Carlo method in different ways: a) we can 
generate L = 100 independent samples of h and n and then utilize this set to 
estimate every point on the curve (by varying |x|? when computing |h — h|?); 
b) each point on the curve is estimated using L new independent realizations 
of h and n. From a programming perspective, the difference is whether the 
L samples are generated before the for-loop that goes through each value 
of |x|? or if L new samples are generated in each iteration of the loop. The 
blue curve is generated in the former way, where the same realizations are 
utilized to estimate every point. This results in a smooth curve that gives 
the impression of being highly accurate, but this is deceiving, as seen from 
the gap to the exact curve. The fact that the same randomness is used when 
estimating every point leads to such unnoticeable systematic errors because 
the estimation errors are correlated. The latter approach is recommended: 
generate L new independent samples for every value of ||?, which was done 
when generating the red curve. This curve is not smooth, showcasing the 
limited precision obtained when only using L = 100 samples in the Monte 
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Carlo method. In summary, to obtain an estimate of the curve that is both 
precise (i.e., smooth) and accurate (i.e., without systematic errors), we must 
use an even larger number of samples generated independently for every point 
on the curve. 


2.6.2 Estimating the Error Probability 


Another common problem in communications is computing the error proba- 
bility or its converse, the success probability. For instance, we might design a 
communication protocol to convey messages over a random channel and want 
to determine the probability that a message is received in error. The more 
complicated the protocol and communication channel are, the smaller the 
chance that we can compute the error probability analytically. However, we 
can use Monte Carlo methods to estimate the error probability. Since there 
are only two possible outcomes—success or error—the randomness can be 
modeled by a Bernoulli distribution, which is a random variable x with two 
outcomes: the value 1 with probability p and the value 0 with probability 
1 — p. The mean is E{x} = p and the variance is Var{x} = p(1 — p). 

Suppose we associate the outcome 1 of the Bernoulli distribution with an 
error, then our goal is to obtain an estimate p of the mean p, representing 
the error probability. Hence, we can follow the same procedure as in the 
previous section: Generate L independent samples x1,..., £z of the Bernoulli 
distribution and then use the sample average + a xi as the estimate of p. 
Each sample can be obtained from one independent trial of the communication 
protocol by determining whether an error occurred or not. This is a feasible 
approach, but the main practical hurdle is determining the number of samples 
that need to be taken. The error probabilities in communication systems can 
range between 0.1 and 1079, which require very different error tolerances and 
numbers of samples when being estimated. 

Suppose we select the error tolerance proportionally to p as 6 = ap, where 
a € [0,1] is the relative error tolerance. The goal is then to find an estimate 
pr that falls into the interval [(1 — a)p,(1 + @)p] with high certainty. By 
substituting this value of ô into (2.175), we need 
Ţ7 4Var{z} _ 4p(1—p) _ 40. —p) 


= 2.1 
82 a2p? a2p (2.176) 


L 


samples to satisfy the tolerance with 95% certainty. This value depends on p, 
so we need a good sense of the (worst-case) error probability when selecting 
L, which severely limits its applicability. However, one important observation 
can be made from (2.176): if p is much smaller than one, then (1 — p)/p ~ 1/p 
and the required number of samples is inversely proportional to p. Hence, the 
more unlikely an error is to occur, the more samples are needed to obtain 
an accurate estimate, which is rather intuitive. A classical rule-of-thumb is 
that L > 10/p samples are needed to obtain a rough estimate of p [45], which 
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implies that we need L = 1000 if p = 107? and L = 10° if p = 1075. Using at 
least L > 100/p samples is recommended to get a precise estimate. 

There is an alternative estimation approach that is particularly well suited 
for estimating error probabilities without requiring prior knowledge when 
determining the sample size [46]: We generate independent samples repeatedly 
until we have gathered Lerror errors, where Lerror > 2 is a predefined constant. 
The number of successful samples Lguccess that are observed before we reach 
Lerror errors is a random variable that has the negative binomial distribution. 
Based on a random realization of Dguccess, we can estimate p as 


Lerror — 1 
L success F Lerror ar: 1 


p= (2.177) 


This estimator is unbiased (i.e., E{p} = p) and is also the one minimizing the 
error variance [47]. The standard deviation of this estimator is approximately 
p/v Lerror — 2 when p is small, thus it is proportional to p and reduces roughly 
as 1/VWLerror- Suppose p is relatively large, in the sense that 1 — p cannot 
be approximated as 1. Then the standard deviation is larger because we 
gather errors too quickly to reach a sufficient total number Lguccess + Lerror of 
measurements to get an accurate estimate. 


A classical rule-of-thumb is to make measurements until we have observed 
Lerror = 10 errors [46], which gives a rough estimate of p with a standard 
deviation of roughly p/ V8 = 0.35p when p is small. To get a precise estimate 
with a smaller standard deviation, observing at least Lerror = 100 errors is 
recommended. In those cases, the —1 terms in (2.177) can be neglected. 


Figure 2.21 exemplifies the error probability p as a function of the SNR. 
The true relation is p = 1 — e7 SNR in this example, which is a formula that will 
be derived in Chapter 5. In addition to showing the exact curve, Figure 2.21 
also shows estimated curves obtained using the two approaches described 
above. The blue curve uses L = 10000 samples and provides excellent estimates 
for p > 107° and decent estimates for 1074 < p < 107%, as predicted by the 
first rule-of-thumb. The curve then vanishes since there are too few samples to 
measure any error events; whenever less than ten errors have been observed, 
we should discard the result as unreliable (recall the second rule-of-thumb). 
The red curve uses the alternative approach of running the simulation until 
Lerror = 100 has been observed. This curve provides accurate estimates of p 
for all the considered SNR values. 


In summary, to avoid selecting L in advance, we can estimate the error 
probability p by counting the number of successes that occurred before we 
reached a predefined number of errors. The number of samples to gather is 
then determined dynamically and increases linearly with the true value of 
p. This approach is particularly useful when a complicated communication 
protocol is used so the error probability cannot be determined analytically. 
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Figure 2.21: Example of estimation of the error probability curve p = 1 — e~ SRR using the 
Monte Carlo method, by either using 10000 random samples or running the simulation until 
100 errors have been observed. 


2.6.3 Empirical Cumulative Distribution Function 


In addition to estimating the mean value of a random variable from observa- 
tions, we can estimate its entire distribution. In this section, we will estimate 
the CDF, defined in (2.100), which fully characterizes the random distribu- 
tion. Suppose we obtain L independent samples z1,..., £z from a random 
distribution with the CDF F,(a). For a given value a, the CDF represents 
the probability of obtaining a realization below or equal to the threshold a. 
Hence, we can estimate F(a) by counting the fraction of the L samples that 
is lower than or equal to a. This estimator can be defined as 


L 
A 1 
Fy L(a) = Tales (2.178) 
by utilizing the indicator function 
1, ifa<a 
Inca = 4 . 2.179 
= F if x >a. ( ) 


We can treat Fy, L(a) as an estimate of the entire CDF and call it the empirical 
cumulative distribution function (eCDF). The true CDF might be a continuous 
function, but the eCDF is always a piecewise constant function. It will look 
like a staircase with L steps, each having a vertical height of 1/L but varying 
horizontal widths that determine the shape of the estimated curve. 

The eCDF converges to the true CDF as L goes to infinity, and the con- 
vergence can be proved in various ways. For example, we can prove pointwise 
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convergence by comparing F(a) to its estimate Êy, L(a) for any given point 
a. For a random z, the indicator function Is<a will output a random variable 
with a Bernoulli distribution that gives 1 with probability F,.(a) and 0 with 
probability 1 — F,(a). As discussed in the previous section, such a random 
variable has the mean F,(a) and variance F,(a)(1— F,(a)). Hence, Êx,z (a) is 
the sample average of L independent Bernoulli variables having that mean 
and variance. The mean of Fx, (a) is the true CDF value F,(a) and the 
variance can be determined using (2.173) as 


L 
Var { Êx, s(a) } = a > Var {s.<a} = FO F(a) (2.180) 


The variance goes to zero as L — oo, which is the property used by the law 
of large numbers to establish asymptotic convergence to the mean. When L is 
large but finite, the central limit theorem implies that Êx, L(a) is approximately 
Gaussian distributed with mean F,(a) and variance F;,(a)(1— F,(a))/L. We 
recall from Section 2.2.1 that 95% of all realizations of a Gaussian random 
variable are within two standard deviations from the mean. 

The precision of the eCDF varies over the curve, reflected by the fact that 
the standard deviation \/F,(a)(1 — F,(a))/L depends on F(a). The largest 
value appears at the median where F;(a) = 0.5. However, it might be more 
important to consider the relative deviation from the true CDF value. If we 
divide the standard deviation by F,(a), we obtain \/(1 — F,(a))/(LF,(a)) 
and it is maximized as F,(a) — 0. This reveals that it is hardest to precisely 
approximate the lower-left tail of the curve because very few samples appear 
in that tail, and small deviations are large in the relative sense. When selecting 
the number of samples L in a practical experiment, one can either target a 
desired precision in the crucial parts of the CDF curve (e.g., center or tails) 
or run the simulation until a visually smooth eCDF curve is obtained. 

Figure 2.22 considers the estimation of the CDF of x ~ Rayleigh(1/v2). 
The analytical CDF expression F(x) = 1 — e-*” of this Rayleigh distribution 
was provided in (2.102). The red curve shows the eCDF obtained using L = 100 
independent samples of the random variable. The eCDF has the same general 
shape as the true CDF but fluctuates between being well aligned with it 
and deviating. The estimation errors are correlated along the curve since 
the same L samples are utilized to estimate all the points on the curve, but 
this property is unavoidable when computing an eCDF. The 95% confidence 
interval around the eCDF (obtained using the Gaussian approximation) is 
also shown in the figure. This interval is relatively wide, which shows that 
more than 100 samples are needed to obtain a precise eCDF. The staircase 
shape of the eCDF is particularly evident in the lower tail, where there are 
too few samples to estimate the precise shape of the CDF. 

The precision is essential when comparing different random variables based 
on estimates of their respective distributions. For example, we might obtain 
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Figure 2.22: The CDF F(x) = 1— e72’ ofa Rayleigh distributed random variable is compared 
with the eCDF obtained using L = 100 samples from the distribution. The approximate 95% 
confidence interval is indicated as a reference. 
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Figure 2.23: The eCDFs and confidence intervals of y ~ Rayleigh(1//2) and z ~ Rayleigh(1), 
based on L = 1000 samples from each distribution. 


measurements of the performance variations in two different communication 
systems and plot their respective eCDFs to determine which system is prefer- 
able. For the sake of argument, Figure 2.23 shows the eCDFs obtained by 
L = 1000 samples from y ~ Rayleigh(1//2) and z ~ Rayleigh(1), respec- 
tively. The two eCDFs are different, but most importantly, the 95% confidence 
intervals (also shown in the figure) are different and mostly non-overlapping. 
Whenever that happens, we can make meaningful comparisons of the eCDFs. 
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Since fy (£) > Êz L(x) for most values of x (i.e., the y-curve is above the 
z-curve), we can conclude that the system represented by y is likely to provide 
smaller performance values. For example, y is smaller than 1 with probability 
Fy.1(1) © 0.6, while z is smaller than 1 with probability F'z,,(1) ~ 0.4. If 
it is preferable to have a large value, the system represented by z should be 
selected. The only uncertainty occurs in the lower left tail, where the confi- 
dence intervals partially overlap. When this happens, we can only conclude 
that their performance is so similar that we cannot tell the systems apart 
with statistical significance. This issue can be mitigated by increasing L to 
improve the precision (i.e., reduce the standard deviation). 


2.7 Detection Theory 


Detection theory provides a structured way to determine which event occurred 
among a finite number of possibilities based on probabilistic observations. It 
is commonly used in several areas, particularly radar signal processing and 
communications [48]. The task of the detector is to determine which event has 
happened by processing the observed signal and exploiting prior information 
regarding the received signal’s characteristics and statistics. The events are 
mutually exclusive, and each is called a hypothesis under testing. Due to this 
terminology, detection theory is also known as hypothesis testing [48]. 

To exemplify the basics, we consider a fire-alarm sensor that measures the 
smoke density in its surroundings. If there is smoke, it sends a wireless message 
representing “1”. If there is no smoke, the sensor does not transmit anything, 
representing the message “0”. A wireless receiver monitors the transmission 
and wants to detect the message. Regardless of what message is sent, noise 
is added to the received signal. Hence, the receiver should use the received 
signal to determine if there is a non-zero signal or only noise. There are two 
events in this example: i) there is no smoke, and ii) there is smoke. Since there 
are two possibilities, we call this a binary hypothesis test. 

In binary hypothesis testing, it is common to let the null hypothesis repre- 
sent the case when the event of interest does not happen. It is denoted as Ho. 
The opposite hypothesis is denoted as H and called the alternative hypothesis. 
Mathematically, we can express the corresponding detection problem as 


Ho : y=n, (2.181) 
Hi : y=1l+n, (2.182) 


where the detector determines if “1” is transmitted or not by observing y 
and exploiting any other prior information, such as the statistical models of 
(2.181) and (2.182). In this section, we will assume that the additive noise 
is distributed as n ~ N(0,07). The goal of detection theory is to select 
a detection performance metric and then develop the detection rule (i.e., 
selection rule between Ho and H1) that optimizes that metric. In the example 
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Figure 2.24: The PDF of the received signal y under two hypotheses Ho and H1. Under the 
null hypothesis Ho, the signal is distributed as y ~ M (0, 0.5) whereas it holds that y ~ N(1, 0.5) 
under the alternative hypothesis H1. 


above, the metric is the probability of making an incorrect detection, and the 
goal is to minimize it. If the a priori probabilities of transmitting 1 or nothing 
are defined and known, they can be used to minimize the error. 

Figure 2.24 shows the PDFs of the received signal y under the null hy- 
pothesis Ho and the alternative hypothesis Hı. Under Ho it follows that 
y = n ~ N(0,07), whereas under Hı we have y = 1 +n ~ N(1,o7). The 
figure shows the case when g? = 0.5. Suppose we use a detector of the form 


nE Hı, ify za, 
Ho, ify <y, 


(2.183) 


where there is a threshold y that determines when to select each hypothesis. 
The two PDFs in Figure 2.24 intersect at y = 1/2, which will also happen for 
other values of o°. Hence, if we select the threshold as y = 1/2, the detection 
rule in (2.183) will select the hypothesis most likely to have generated the 
received observation y. This threshold divides the decision region symmetrically 
into two parts, as illustrated by the red dashed line in Figure 2.25. This 
threshold maximizes the probability of making a correct detection if the 
two events are equally likely, which is seemingly a good performance metric. 
However, it is not the only metric of practical importance. Three other 
important metrics are: 


e The detection probability, Pp, which is the correct detection probability 
when the event of interest happens, i.e., under hypothesis H1; 


e The false alarm probability, Pra, which is the wrong detection probability 
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when the event of interest did not happen, i.e., under hypothesis Ho; 


e The missing probability, Py = 1— Pp, which is the wrong detection 
probability when the event happens, i.e., under hypothesis H1. 


In Figure 2.25(a), the yellow and purple shaded regions represent the 
detection probability, Pp, and 1 — Ppa, respectively (i.e., the areas under the 
curves equal the probabilities). When hypothesis H is true, we detect the 
event correctly when the received signal is greater than the threshold y = 1/2, 
which happens with the probability Pp. On the other hand, when hypothesis 
Ho is true, we detect the event correctly when the received signal is below the 
threshold, which happens with the probability 1 — Ppa. Figure 2.25(b) shows 
the probabilities of false detection. When , is true, but the noise takes a big 
negative realization so that the received signal is below the threshold, we miss 
the event, and the resulting probability is Py = 1 — Pp. When Ho is true, but 
the noise takes a big positive realization so that the received signal is above 
the threshold, a false alarm occurs. The associated probability is Pra. 

It is good to have high values of Pp (corresponding to low values of Pm) 
and low values of Pra, but there is unfortunately always a tradeoff between 
these metrics. To illustrate this, we increase the threshold value to y = 1 in 
Figure 2.26. As shown in Figure 2.26(a), the correct detection probability 
when there is no transmitted signal (i.e., Ho is true) increases compared to 
the last figure. Similarly, the false alarm probability decreases, as shown in 
Figure 2.26(b). However, this improvement is associated with a decrease in 
Pp since a larger threshold makes it less likely to make the correct detection 
decision when there a signal is transmitted (i.e., Hı is true). Moreover, the 
missing probability Pm = 1 — Pp increases when Pp decreases. 

The fact that there are multiple conflicting design objectives implies that 
we need to actively design the decision rule for every detection application, 
even if the underlying mathematical models are the same. For example, a 
fire-alarm sensor might be designed to have a very high detection probability, 
Pp, since missing the event of interest can be dangerous. On the other hand, 
a radar surveillance system might be designed to have a very low false alarm 
probability, so it only identifies large objects. 

The hypothesis testing we have considered so far assumed that the PDF 
of the received signal is fully known for all the hypotheses, which is known as 
simple hypothesis testing. For example, in the previous example, we know that 
y ~ N(0,0.5) when Ho is true, whereas y ~ N (1,0.5) when H1 is true. We will 
focus on simple hypothesis testing in this book. Another class of problems is 
composite hypothesis tests in which there are unknown deterministic parameters 
or random variables with unknown distributions. An example of this is the 
detection problem in (2.181)-(2.182) when the noise variance ø? is unknown; 
the PDF of y is unknown for all the hypotheses because we only know the 
Gaussian shape but not the associated variance. 
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(a) The correct detection probability under Ho is 1 — Pra (the area of the purple region), while it is 
Pp under Hı (the area of the yellow region). 
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(b) The wrong detection probability under Ho is Ppa (the area of the yellow region), while it is 
1— Pp under Hı (the area of the purple region). 


Figure 2.25: The probabilities of correct and incorrect detection under the hypotheses Ho and 
Hı when the detection threshold is 1/2. The dashed red line shows the corresponding detection 
boundary. The areas of the shaded regions represent the respective probabilities. 
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(a) The correct detection probability under Ho is 1 — Pra (the area of the purple region), while it is 
Pp under Hı (the area of the yellow region). 
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(b) The wrong detection probability under Ho is Ppa (the area of the yellow region), while it is 
1— Pp under Hı (the area of the purple region). 


Figure 2.26: The probabilities of correct and wrong detection under the hypotheses Ho and 
Hı when the detection threshold is 1. The dashed red line shows the corresponding detection 
boundary. The areas of the shaded regions represent the respective probabilities. 
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Example 2.17. Consider the binary hypothesis test 


Ho : y=n, (2.184) 
Tie oe i= toe. (2.185) 


where x ~ Nc(0, P) is the transmitted signal under the hypothesis Hı and 
n ~ Nc(0, o°) is independent receiver noise. The random signal x is unknown 
at the detector, but the transmit power P and noise variance o? are known. 
Is this a simple or composite hypothesis test? 

We need to determine if the PDF of the received signal y is known 
under all hypotheses. When Hg is true, it follows that y ~ Nc(0,07) so the 
distribution is known. When H, is true, it follows that y ~ Nc(0, P + o?) so 
this distribution is also known. Hence, we have the full knowledge of the PDF 
of the received signal in both cases, which implies that this hypothesis test 
belongs to the “simple” category. 


In the following sections, we will consider two approaches to simple hy- 
pothesis testing. The fundamental difference is whether the occurrences of the 
different events are modeled statistically or not. 


2.7.1 Bayesian Detection 


In the Bayesian detection approach, we assume that the occurrence of each 
hypothesis can be modeled statistically and has a specific probability. This 
approach is particularly useful when the underlying events happen repeatedly 
so that statistics can be inferred as described in Section 2.6, and the detector 
will be applied many times so that its average performance is essential. 
Consider a binary hypothesis test where Pr{Ho} and Pr{H,} denote the 
probabilities that the hypotheses Ho and Hı take place, respectively. In the 
detection problems where we know these probabilities (e.g., communication 
tasks where the messages are designed to be equally likely), it is of interest to 
minimize the error probability, which is defined as 


P, = Pr{Ho} Pr{H = Hi[Ho} +Pr{H1} Pr{H = Holi}, (2.186) 
ee—_+. ama -e___—<—S 
=Ppa =Pm=1- Pp 


where the conditional probability Pr{H = Hı|Ho} is the probability of detect- 
ing the hypothesis Hı when Ho is true, which we previously called the false 
alarm probability, Ppa. Similarly, Pr{Ĥ = Ho|H1} is the conditional proba- 
bility of selecting the hypothesis Ho when H, is true, which we previously 
called the missing probability, Pm = 1 — Pp. The detector that minimizes the 
error probability, P, is as follows [48, Ch. 3]. 
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Lemma 2.13. The detector that minimizes the error probability in (2.186) 
selects the hypothesis Hı if 


fym (YIH) Pr{Ho} _ 
Fyivto (WlHo) E 


where fym (Y|H1) and fyi, (YHo) denote the conditional PDFs of the re- 
ceived signal y when Hı and Ho are true, respectively. 


(2.187) 


The ratio of the conditional PDFs on the left-hand side of (2.187) is 
called the likelihood ratio. The detector that minimizes P, compares it to the 
threshold y, which is the ratio of the a priori probabilities of the hypotheses. 
The threshold is 1 when the hypotheses are equally likely. On the other hand, 
when hypothesis Hı is more likely, then y is smaller to decrease the missing 
probability, Py, since its contribution to (2.186) is more dominant compared 
to the false alarm probability. When hypothesis Hg is more likely, the optimal 
y is greater than 1 to force Pra to become smaller. 


Example 2.18. Consider the binary hypothesis test in (2.181). For a given 
value of y = Pr{Ho}/Pr{H1}, derive the Bayesian detector that minimizes 
the error probability. What are Pp and Ppa for this detector? 

The received signal y is distributed as y ~ N(1,07) when Hı is true. On 
the other hand, it is distributed as y ~ MN (0, o?) when Ho is true. Inserting 
the respective Gaussian distributions from (2.63) into the likelihood ratio in 
(2.187), we obtain the minimum P, detector as 


1 = C- DE 
Voro? 3 ee 
Qno2 = ay = a 5 > In(y) 
Gao e` 2o? 
v- ay 1 
! = es ae a In(y) + 5; (2.188) 


where we used the fact that In(y) is a monotonically increasing function of 
y > 0, so it can be applied to both sides of the inequality without changing 

6 4 4 7 $ —  ® 1 4 
the inequality sign. By using the notation y' = o In(y) + 5, the detection 
probability is associated with the event y > y’ and computed as 


(y=1)? 


5 1 
P =| Hı) Oy = "ae Oy. 2.189 
b= J, Sure lta) Oy = Te y (2.189) 


Similarly, the false alarm probability is associated with the event y > 7 and 
is computed using fyjHo (y|/Ho) as 


lo @) 1 2) 
Pes = f ey) (2.190) 
oN. 2ro2 
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In Figure 2.27, we show the missing probability (Pm = 1 — Pp), the false 
alarm probability (Pra), and the error probability (P) for the considered 
binary hypothesis test, as a function of the threshold y. The curves are 
generated using the formulas derived in Example 2.18. In Figure 2.27(a), we 
consider equally likely hypotheses (i.e., Pr{Ho} = Pr{Hi} = 4). The Pm- 
curve increases with y, while the Ppa-curve decreases. The error probability 
is a weighted sum of these metrics, so it goes down and then up again when 
y increases. The threshold that minimizes Pe is y = Pr{Ho}/Pr{Hi} = 1, 
which is denoted by a cross in the figure. We notice that the optimal threshold 
occurs where Py = Pra, which can be proved analytically. As the threshold 
increases beyond 1, Pra decreases but Pm increases faster, which leads to an 
increased error probability P,. If y instead becomes smaller than 1, then Pm 
decreases but Ppa increases faster, leading to an increased error probability. 

In Figure 2.27(b), we set Pr{Hi} = % and Pr{Ho} = 4, which leads to 
the optimal threshold y = Pr{Ho}/Pr{H1} = 2. The figure confirms that the 
minimum error probability is obtained when y = 2. Ppa is less than Py at this 
point, which is expected since the contribution of Ppa to the error probability 
in (2.186) is more dominant since it is multiplied by Pr{Ho}, which is larger 
than Pr{H,} that is multiplied by Py. 


2.7.2 Neyman-Pearson Detection 


There are situations when prior information about the hypothesis probabilities 
is unavailable, either because the statistics are hard to obtain or because the 
events only occur once, so statistical modeling is not viable. We can then 
follow the Neyman-Pearson detection approach where a priori probabilities 
of the hypotheses are not considered. This approach is common in radar 
applications; for example, in target detection, it is hard to set a probability 
for the existence of a target. Instead, a desired value of Pra = a is set, and 
the detector is designed to maximize Pp under the condition that Pra = a. 
The detector that maximizes the detection probability in such a constrained 
detection problem is as follows [48, Ch. 3]. 


Lemma 2.14. The detector that maximizes the detection probability, Pp, 
under the constraint that Ppa = a selects the hypothesis Hı if 


fins WH) 


> 4, (2.191) 
Iela (y\Ho) 
where the threshold y is selected to satisfy 
Pra = ees fyno (y|Ho) Oy = a. (2.192) 


FyiHo WIHo) — ! 
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(b) Pr{Ho} = 2Pr{H1} = 2 and the optimal threshold is y = 2. 


Figure 2.27: The missing probability (Pm), the false alarm probability (Ppa), and the error 
probability (Pe) as a function of the threshold y for the binary hypothesis test in Example 2.18 
with o? = 0.5. The cross shows the threshold from Lemma 2.13 that minimizes the error 


probability: y = Pr{Ho}/Pr{H1}. 
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Example 2.19. Consider the binary hypothesis test in (2.181). Derive the 
Neyman-Pearson detector that satisfies Ppa = a. What is Pp for this detector? 

The condition in (2.191) appeared already in the Bayesian detector but 
for a predefined value of y. We now need to find the value that gives equality 
in (2.192). If we rewrite this condition in the way previously done in (2.188), 
our goal becomes to find the value of y’ that results in Pea = a. This value 
is found by solving the equation 


co 1 _ y? y' 
P, ==] e 2079 =1-5 (2), 2.193 
FA 7 Jona” y y o ( ) 


where Fy(y) denotes the CDF of the standard Gaussian distribution with 
zero mean and variance 1. Since the CDF of a continuous random variable is 
an invertible function, we can solve for 7//a and obtain y = oF + (1 — a). 
In conclusion, the Neyman-Pearson detector selects the hypothesis Hı if 
y > oF, '(1—q) and selects Ho otherwise. If we insert that value into (2.189), 
we obtain the detection probability 


ki 1 (y=1)? 
P =] e z2 ðy = 1— F, (F7 t(1—a)— 07t), (2.194 
E ee. v=1- F (F; '(1-a)— 0), (2.194) 


where we made a change of integration variable from y to (y — 1)/a0 when 
obtaining the final result. 


We can use the Neyman-Pearson detector to handle the binary hypothesis 
test in (2.181), using the formulas derived in Example 2.19. Figure 2.28 shows 
how the detection probability, Pp, varies with the SNR. Three different false 
alarm probabilities are considered: a = 107}, a = 107°, and a = 1075. Since 
the signal of interest is 1 under H1, the SNR is defined as SNR = 1/07. The 
detection probability improves as the SNR increases for any given value of 
Ppa. We notice that Pp is higher when the false alarm probability is set to a 
higher value. This is expected since the challenge in detection is to handle 
uncertain cases. If we select Hı for most of these cases, we get a high value of 
Pp but also many false alarms. When the desired value of Ppa is smaller, a 
higher SNR is needed to achieve the same Pp. 


2.8 Frequency Domain and Discrete Fourier Transform 


Wireless signals can be equivalently represented in the time domain and 
frequency domain. The Fourier transform was used earlier in this chapter to 
obtain the frequency-domain representation of continuous-time signals. In 
this section, we will describe the mathematical transformation between these 
domains for discrete signals. In particular, we will define the discrete Fourier 
transform (DFT) and describe how it can be utilized to analyze the frequency 
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Figure 2.28: The detection probability, Pp, versus the SNR = 1/o?, for three different values 
of Ppa. The Neyman-Pearson detector is used for the binary hypothesis test in Example 2.19. 


content of a sampled time-domain signal of finite length. In communication 
systems that operate over a large bandwidth, it is common to insert the data 
content into the frequency-domain representation of the signal instead of the 
time-domain representation. The reason can be to efficiently handle channels 
that change substantially over the signal bandwidth. We will provide key 
results regarding the DFT and inverse DFT (IDFT) that will be utilized in 
later chapters. 

Consider an S-length sequence x[{0],...,.[S — 1] with samples of a time- 
domain signal. The DFT of this sequence is a sequence x[0],...,¥[S — 1] of 
equal length that describes the frequency-domain content and is given by 


S—1 
xiv] = Fa{x[s]} = 5 5 x[sle 7/5 forvy =0,...,9—1. (2.195) 
s=0 


The constant 1/V‘S in (2.195) ensures that the energy is the same in both the 
time-domain sequence and the corresponding frequency-domain sequence: 


S-1 S-1 
de dsl? = So Ixl? (2.196) 


which is known as Parseval’s relation. Many other textbooks omit this scaling 
factor, which results in an energy mismatch that must be compensated for 
when taking the IDFT. However, the scaling factor is vital in communications 
since the signal energy is constrained, and we want to be able to measure it 
over both time and frequency. The IDFT of the sequence x{0],...,~[S — 1] is 
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computed as 
xls] = Fy * {x 45 yi ere for s=0,...,9—1 (2.197) 
and returns the original time-domain sequence. 


The DFT is a linear transform, which can be seen by defining the S x S$ 
DFT matrix 


1 1 1 
2 s-i 
1 |1 vs Us oe Us 
a l , , 2.198 
z VS |: : : : ; ( ) 
1 ga: oo ee Saale 


where vg = e 27/5, We can use Fg to write the DFT in (2.195) in vec- 
tor/matrix form as 


xlo] xlo] 
: = Fs : ; (2.199) 
XIS - 1] x[S — 1] 
> =x 


or more concisely as x = Fsx. The DFT matrix is unitary (i.e., FGFs = 
F>F3 = Is), thus the IDFT can be obtained from (2.199) by multiplying 
with the IDFT matrix F4 from the left-hand side: 


x= Fey. (2.200) 


The columns of Få are an orthonormal basis in CS since the DFT matrix is 
unitary. Any signal vector x is spanned by this basis, and the basis vectors 
can be shown to represent a set of specific signal frequencies. 


2.8.1 Interpretation of Signal Frequencies 


Any S-length signal can be represented by a vector x = [x[0],...,x[/S—1]]™ € 
C5. The IDFT formula in (2.200) shows that this vector can also be represented 
as a linear combination of the columns of F'3 with the coefficients given by the 
DFT vector x. The columns of F¥ take the role of an orthonormal basis in C% 
and are not selected arbitrarily but to represent different signal frequencies. If 


we count the columns of F4 from 0 to S — 1, then column v € {0,...,5 — 1} 
is 
1 ej?5 0 
(v5) ig 
1 (vs )* 1 | aj? 
—— = 2.201 
g Vs ar 
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This basis vector contains S equal-spaced samples of the complex exponential 
e?S* with the normalized frequency v/S and the integer sample times 
l= 0,1,...,S — 1. The frequency is normalized in the sense that the time 
between the samples is unspecified. Suppose the discrete samples are obtained 
from a complex exponential e/?7f* with the frequency f Hz and time variable 
t € R. In that case, we need to know the sampling rate (samples/second) to 
connect the normalized frequency to the original frequency. 

We considered the normalized frequencies v/S € {0,...,(S—1)/S} between 
0 and 1 when computing the IDFT in (2.197), but we can as well consider 
another interval of length 1. The reason is that the complex exponential est! 
is a periodic function of v. In particular, we obtain the same column in (2.201) 
with v/S and v/S +n for any integer n because 


e2n(Stn) = aoe = es aed 


=I 


(2.202) 


when / is an integer. Since positive and negative frequencies often come in 
pairs in practical signals (e.g., in the complex baseband representation), it is 
common to consider a symmetric frequency interval such as f € [—B/2, B/2), 
where the upper limit is excluded so that the Nyquist-Shannon sampling 
theorem stated in Lemma 2.8 is satisfied. Hence, utilizing the normalized 
frequency interval f € [—1/2,1/2) that is also symmetric around zero can be 
convenient. There is then a simple bijective mapping where the sampling of a 
signal with the original frequency f results in the normalized frequency 


z_f 
f=5 (2.203) 


when the sampling rate is B sample/second. Half of the normalized frequencies 

n [—1/2,1/2) are negative, and the concept of negative frequencies might 
seem illogical but is fundamentally important. The complex exponentials with 
the positive normalized frequency v/S and with the negative counterpart 
—v/S only differ by a complex conjugate: 


eR = (lB T (2.204) 


Hence, the real parts are equal, while the imaginary parts have opposite signs. 
Euler’s formula in (2.3) can be utilized to create any discrete-time sinusoidal 
signal with the normalized frequency v/S as a linear combination of ltl 
and ei”, for example, we can create the cosine and sine signals as 


2 TY 1 TY 

cos (=. t) = soe a TEN, (2.205) 
2 i ay eee 

sin ( = 1) =z lsd 3° en, (2.206) 


This is why we need pairs of positive and negative frequencies to synthesize 
arbitrary signals using the IDFT. 
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Figure 2.29: Illustration of how the positive range v/S € {0,...,(S —1)/S} of normalized 
frequencies can be turned into the symmetric range in (2.207) with both positive and negative 
frequencies through a cyclic shift. S = 10 samples are considered in this example. 


Example 2.20. Which are the S normalized frequencies f € [—1/2,1/2) that 
the IDFT utilizes? 

The columns of the IDFT matrix are generated by the normalized fre- 
quencies v/S € {0,...,(S —1)/S}. The lower half is in the intended range 
[0, 1/2), while the upper half is in the interval [1/2,1) that is larger than 1/2. 
We can use the periodicity property from (2.202) to subtract 1 from these 
normalized frequencies and obtain an equivalent representation in the range 
[—1/2,0). The IDFT is therefore synthesizing signals using the following S 
normalized frequencies f between —1/2 and 1/2: 


pe fG NS e „B 


S 


t if S is even, 


i 2.207 
Es- 4 — a if S is odd, ( ) 


where the operator [-] returns the closest integer larger than or equal to its 
argument. The first and last frequencies differ for even and odd values of S. 


Figure 2.29 shows how to switch from the range v/S € {0,...,(S5 — 1)/S} 
of positive normalized frequencies to the symmetric range in (2.207) with 
both positive and negative frequencies. This is achieved through a cyclic shift 
where the upper half is moved to the beginning. The figure shows the case 
of S = 10, which is an even number, so 1/2 is one of the original normalized 
frequencies (this will not happen if S is odd). This frequency is equivalent to 
1/2 — 1 = —1/2, so we can put it in either the beginning or the end of the 
symmetric range. We follow the convention of starting with —1/2 so that the 
cyclic shift divides the range into two equal halves and shifts their order. 

Figure 2.30(a) shows the real and imaginary parts of e°S*| for v =5 and 
S = 7. The curves are drawn as a function of a continuous variable l, but the 
samples obtained at the integer times l = 0,...,6 are marked with circles. 


134 Theoretical Foundations 


When the DFT is applied to these 7 time-domain samples, we obtain the 
vector X = [0,...,0, V7, 0]? where only the sixth entry corresponding to v = 5 
is non-zero. The normalized frequency is v/S = 5/7, which is not within the 
interval [—1/2,1/2) but can be identically represented within that range by 
f =5/7-—1 = —2/7. Figure 2.30(b) shows the DFT representation of the 
signal using the set of normalized frequencies from (2.207) that are within the 
desired interval [—1/2, 1/2). 

Figure 2.31 illustrates how the IDFT formula x = Fx in (2.200) syn- 
thesizes the time-domain signal by showing how each column of F% contains 
samples of a complex exponential with a different frequency. The time axis 
points downwards, with positive values to the left of the vertical lines. We 
consider S = 7 as in the last figure. The curves in the first four columns are 
obtained for the positive normalized frequencies 0, 1/7, 2/7, and 3/7. The 
curves in the last three columns are obtained using the negative frequencies 
—3/7, —2/7, and —1/7, which are equivalent to the normalized frequencies 
4/7, 5/7, and 6/7 that are outside the range [—1/2,1/2). The color coding 
identifies the columns that oscillate at the same frequency except for a different 
sign, leading to the same real parts but inverted imaginary parts. 

The considered time-domain signal x and its DFT y are sequences of the 
same finite length S, but the DFT and IDFT definitions can be easily extended 
into infinite sequences. The IDFT formula in (2.197) can be evaluated for any 
integer s, but the sequence is S-periodic since x[s + S] = e#!?75”/Sy[s] = xfs] 
follows by the fact that e#/?$¥/5 = 1. This can be pictured by considering 
Figure 2.31 and adding additional rows to the matrix by extending the 
oscillating curves up and down. No additional frequencies would be added to 
the signal when doing that. Similarly, for any integer v, the DFT in (2.195) 
satisfies X[v + S] = eF?7°5/S¥[v] = X[v] since eF!?795/5 = 1. This frequency- 
domain periodicity is the property we utilized when shifting the interval of 
normalized frequencies from [0,1) to [—1/2,1/2). 


In summary, any S-length signal vector x can be expressed as a linear 
combination of (samples from) complex exponentials having the S normalized 
frequencies stated in (2.207). This is why the DFT gives a frequency-domain 
representation, and the coefficients of the linear combination are stored in the 
DFT vector x. 


2.8.2 Finite Impulse Response Filters 


The discrete-time representation of a communication system might contain 
the filtering of a signal sequence by a finite impulse response (FIR) filter, 
which might represent the communication channel in a discrete time. A causal 
discrete-time FIR filter of order T provides the output signal 


y[k] = lO) x[k] + h[l]x[k — 1] +... + A[T]x[k — T], (2.208) 
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Figure 2.30: The signal a'g with v = 5 and S = 7 is sampled at the integer times 
l = 0,1,...,6. The time-domain representation is shown in (a), and the frequency-domain 
representation is shown in (b) using the set of normalized frequencies from (2.207). 
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Figure 2.31: The IDFT formula in (2.200) is illustrated with a connection to the complex 
exponentials that are sampled to obtain the entries of Fg. The solid lines show the real parts, 
the dashed lines show the imaginary parts, the dotted vertical lines show the time axis, and the 
stars show the sampling points. 


where x(k] is the input signal and h[0],...,h[Z] is the (T + 1)-length discrete 
impulse response that characterizes the filter. Figure 2.32 illustrates the 
filtering operation in (2.208). The individual terms h[k] are often called taps, 
and the entire filter can be referred to as a tapped delay line since the output 
contains delayed copies of the input multiplied by different taps. 

When the signal sequence y(0],...,[/S —1] is sent as input to an FIR filter 
of order T < S, the output (2.208) can be expressed as a linear convolution 
(denoted by *) between the input sequence and the impulse response: 


II 
Mas 


ylk] = (h* x) [A] hlélx[k — 4] fork =0,...,S5—1. (2.209) 

&=0 
This equation also depends on the T signal values x[—T],...,x[—1] sent 
before the actual transmission began. This is a major issue if we want to 
identify all input signal values from the output sequence y/0],...,y[S — 1] 


because there are S + T parameters to identify but only S observations. 
Hence, controlling the content of the extra T signal values is desirable to avoid 
transient effects where unknown signals are mixed with the intended ones 
to create an ill-posed signal identification problem. A simple solution is to 
actively send a prefix containing y[—T],...,[—1] into the FIR filter before 
the actual intended transmission of x[0],...,[S — 1] begins. The prefix can 
be designed in different ways under the constraint that it is not introducing 
any additional unknown signal values: we can only handle S unknowns when 
having S observations. One option is to use a silent prefix represented by 
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Figure 2.32: A block diagram of a discrete-time FIR filter of order T, which takes x[k] as 
input and provides y[k] as output. 


x[-1] = ... = x[-T] = 0, so the corresponding terms vanish from (2.209). 
This prefix has the benefit of not increasing the total signal energy but has 
the drawback that we must design an inverse filter based on the channel 
taps to recover the input signal sequence. As will soon become apparent, a 
more convenient option is to add a cyclic prefix where we use values from 
the end of the sequence: x[—1] = x[S — 1], x[-2] = x[S — 2], and so on until 
x[-T] = x[S — T]. This option has the important consequence that the input 
signal sequence will appear to be periodic, in the sense that the received signal 
in (2.209) can be expressed as 


= Y hläxik - 


T 
= Do hax [(k —2)moas] = (h® x)[k] fork =0,...,9—1, (2.210) 


where “mod S” is the modulo operation that adds S to k — ¢ whenever 
needed to get a value between 0 and S — 1. Even if the FIR filter performs a 
linear convolution, the addition of the cyclic prefix makes the output signal 
mathematically equivalent to a cyclic convolution between h[0],...,h[T] and 
an infinite S-periodic extension of y(0],...,x[S — 1]. Recall that S-length 
sequences behave as S-periodic sequences when analyzed using the DFT, so 
this is the property that we want to maintain by adding the cyclic prefix. It 
is called cyclic (or circular) convolution since the modulo operation provides 
indices from the end of the signal sequence when k — £ is negative; for example, 
(—1) moa s =s 1, (—2)mod S =S- 2, etc. 
The DFT of the output y/0],...,y[S' — 1] can be expressed as 


S-1 S-1 T 


= 1 —j2rsv —j2rsv 
glv] = a yls]e eS =E Y= hlAlx — £) moa s]e peer’ 
s=0 s=0 £=0 


T S- 
5 PL Dogs FONS (2.211) 
S125 i= 
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by changing the summation index from s to i = s — l. We can further rewrite 
the expression by adding S$ to all negative values of 7 and exploiting the cyclic 
signal structure, which results in 


T 1 = S—1-£ 
= SH 75 Swale ees 2 xli Jers) —j2mtv/S 
l=0 vS i=—£ 
= ee s_e Xlilem j2n(i—S)v/S 
z =, | 
= y hife —j2rly/S = J5 xl fe ree (2.212) 
£=0 
hiv] =x([v] 


where the equality follows by using that e!?75”/5 = 1 since v is an integer. 


The final expression in (2.212) shows that y[v] is the product between the 
DFT of the input signal and frequency response of the FIR filter, defined as 


T 
=> hes fo v=0,...,5—1. (2.213) 


The frequency response is defined similarly to the DFT of a signal, except for 
the lack of a 1/5 scaling factor.? The property we derived above is known 
as the cyclic convolution theorem. 


Lemma 2.15. Let y[k] = (h ® x)[k] denote the S-length sequence obtained 
by cyclic convolution between the sequence y(0],...,x{S — 1] and the FIR 
filter h[0],...,h[L] with order T < S. The DFT of yk] is given by 


glv] = ley) ee Sh = 1, (2.214) 


where x[v] is the DFT in (2.195) and h[v] is the frequency response in (2.213). 


This lemma states that the DFT of a cyclic convolution between a signal 
sequence and the impulse response of a filter is the product of the respective 
frequency-domain representations. This is the discrete counterpart of the 
(perhaps) more widely used property that the continuous Fourier transform of 
the convolution between two functions is the product of the Fourier transforms 
of the respective functions. The practical consequence of this lemma is that 
we identify the DFT of the input signal sequence by computing the DFT of 
the output signal sequence and then simply dividing y[v] in (2.214) by Alp]. 


Many textbooks omit the 1/ VS factor when defining the DFT to achieve symmetry between 
how signals and impulse response are transformed to the frequency domain. As mentioned 
earlier, the drawback of that convention is that the signal energy will differ between the time 
and frequency domains, which we circumvent by using (2.195) for the DFT of a signal and 
(2.213) for the frequency response of an FIR filter. 
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The DFT operation is the same irrespective of the channel taps, which makes 
it convenient to implement in hardware. We will return to this in Section 7.1.1 
when considering orthogonal frequency-division multiplexing (OFDM). 

We can also establish a matrix-vector representation of the FIR filter. If 
we begin by considering the cyclic convolution in (2.210) and assume T = 3 
(for brevity), we can connect the S outputs with the S inputs as 


nfo] O ... ww.) 0 AB] AQ) AL 
All] hf] O 1... 2... 2.) 0) AB) RL] 
hi2] hf] blo) O ... 2...) 0 RBI 
h[3] hf2] hf] kf) oO ... 2. 2. 0 
| y[0] | ce & ee a oe, Se ee | x(0] | 
eal Dt. AB) A] Aft] Alo) OF Peak 
ach 0 As} A] All] Alo] 0 a 
... 0 Al3}] Al2] All] Alo] 0 
Ow. .. 2. 0 AB) AP] hfi] ALO] 
ns aunas 


(2.215) 
which can be written in short form as y = C;,x. The filtering is carried out 
by the S x S matrix called C;,, where each row contains all the channel taps 
but shifted cyclically one entry to the right for each row. This kind of matrix 
is known as a circulant matrix and can be created for any value of T < S. 
Any such matrix can be viewed as the matrix representation of the cyclic 
convolution that an FIR filter carries out when the input has a cyclic prefix. 

Another matrix-vector representation can be established by considering 
the frequency-domain expression in (2.214), which we can write as 


7) al x0] 
y eee x 
. le : nt 2 : (2.216) 
g[S — 1] oe ae XIS — 1] 
— 0 soe 0 h[S =< 1] =— 
=y ‘ é =X 


or in short form as y = D;x. We notice that D} is a diagonal matrix 
containing the frequency response of the FIR filter. We can connect the time- 
domain representation in (2.215) and the frequency-domain representation 
in (2.216) using the DFT matrix Fs. We known from (2.199) that x = Fsx, 
which also implies that y = Fsy. By substituting these expressions into 
(2.216), we obtain 


Foy = D;Fsx => y= F°D;F sx. (2.217) 
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Figure 2.33: The complex exponential signal in (2.219) travels along the x-axis and the signal 
at time t = 0 is shown. The wavelength A is the spatial interval between two peaks. The spatial 
frequency 1/A is the number of wavelengths that fit into one meter. 


By comparing (2.217) with (2.215), we notice that the circulant matrix C, 
can be alternatively expressed as 


Cr = F2D;Fs. (2.218) 


Since D} is a diagonal matrix and Fg is a unitary matrix, we recognize 
(2.218) as the eigendecomposition of C}; it has the same structure as in 
Lemma, 2.1, except that the eigenvalues can be complex in this case since 
Cp is not Hermitian. The eigenvalues are the entries h[0],...,h[S — 1] of the 
frequency response of the filter, while the eigenvectors are the columns of the 
IDFT matrix Fb. Since this result holds for any circular convolution, we can 
conclude that the DFT matrix diagonalizes any circulant matrix. 


2.8.3 Temporal and Spatial Frequencies 


The DFT was introduced in this section to study the temporal frequencies 
contained in a time-varying signal, but there is another related concept: spatial 
frequencies. When an electromagnetic signal propagates through free space, 
it can be observed simultaneously at many spatial locations, but it will be 
delayed differently depending on how far it has traveled from the signal source. 
Suppose the complex exponential signal e?*/e! = cos(2r fet) + jsin(27 fet) 
with the temporal frequency fe is emitted from a source located in the origin, 
as illustrated in Figure 2.33. The signal observed at the spatial location x > 0 
along the positive x-axis at the time t is 


oi2tfe(t-2) _ el2m fet e—i75* (2.219) 


where z/c is the propagation delay, c is the speed of light, and the wavelength 
at the carrier frequency is denoted by À = c/ fe. For a given communication 
system, the carrier frequency is predetermined, while the wavelength might 
change depending on the speed of light, which is reduced in some propagation 
media compared to its maximum value 299 792 458 m/s obtained in free space 
(i.e., vacuum). We will treat c as equal to the maximum in this book since 
the waves reach the receiver through the air. The factor e}?7fst in (2.219) 
determines the temporal signal variations while the factor e-i’ determines 
the spatial variations. At time t = 0, the signal observed along the z-axis is 


eS" = cos (Za) jsin (Za) ; (2.220) 
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which is a periodic function that repeats itself every \ meters, thus the spatial 
frequency is 1/A, representing the number of periods per meter. The spatial 
frequency is also called the wavenumber, but we will use the spatial frequency 
terminology in this book to highlight that signals obtained in the time and 
space domains can be studied using the same methods (e.g., the sampling 
theorem and filtering). Spatial frequencies can be positive and negative, but 
the convention is that there is a minus sign in the complex exponential as 
in (2.220) when the spatial frequency is positive. In this way, the positive 
temporal frequency fe gives rise to the positive spatial frequency 1/X. In this 
example, the spatial frequency is the same at any time t since the wave is 
shifted to the right as it travels along the line. This follows from the fact that 
the time variable t and spatial variable x affect different factors in (2.219). 

The temporal frequency f. and the spatial frequency 1/ are closely related 
in wireless signaling (they only differ by a factor c), but there is a distinct 
conceptual difference. One way to separate the concepts is to consider a 
video recording of wave propagation (e.g., ocean waves). A video contains 
a sequence of frames (pictures shown at different times), and each frame 
consists of colored pixels at different screen locations. The temporal frequency 
describes how the wave observed at a particular pixel evolves with time. In 
contrast, the spatial frequency describes how the waves at a particular time 
instance oscillate between the pixels in the current frame. The fundamental 
relation between temporal and spatial frequency breaks down when static 
objects are introduced in the propagation environment. In that case, the 
temporal frequency remains the same, but the waves change directions when 
interacting with the objects, changing the spatial frequency observed along 
the given line. The connection also breaks down when observing the wave 
along a line that is not parallel to the direction the wave travels. 

Figure 2.34 shows how the sinusoid cos(2r fet) propagates radially in 
two dimensions from a transmitter located in the origin, where the coloring 
describes its value. The signal observed at the point (x,y) at time t is 


cos (as. ę yat £) = COs az SAN oe) 


Cc AÀ 


z2 4y? 1 ion V 22y? 
x Zg j27 fet j2T x : 


= sertete j2n 
(2.221) 


which is obtained similarly to (2.219) but with the propagation distance 
computed as yx? + y?. We also used Euler’s formula as in (2.8) to express the 
cosine as two complex exponentials, which reveals that the considered signal 
contains the spatial frequencies 1/A and —1/A. The figure shows this signal 
at time t = 0, and we observe that the pattern is invariant to radial rotations 
since the signal propagates equally in all angular directions. The distance 
between two adjacent peaks in any radial direction equals the wavelength A. 
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Figure 2.34: A sinusoidal wave propagates radially from a transmitter located in the origin. 
The middle figure shows the signal at different locations at t = 0. The upper and lower figures 
show how the signals observed along two lines contain different spatial frequencies. 
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At the bottom of the figure, the waveform observed along the black line is 
shown. This line covers the positive x-axis, a radial direction from the origin. 
At time t = 0, we observe a sinusoid cos(7%*) with the wavelength \ and the 
spatial frequencies +1/A. The signal observed along the blue line is shown at 
the top of the figure. This signal appears aperiodic and contains a broader 
range of spatial frequencies. The reason is that the wave propagation is not 
aligned with the direction of the line. The distance between the adjacent 
peaks varies but is larger than A, which indicates that the observed signal 
only contains spatial frequencies in the range [—1/A,1/A]. 


There are two main messages from this example. Firstly, the spatial 
frequencies of the signal observed along a given line segment depend on the 
location of the source. Hence, the observed signal can be used to identify the 
source location or at least its angular direction. This estimation problem will 
be considered in later chapters. Secondly, the observed signal has the original 
spatial frequencies +1/ when considering a line drawn in the same direction 
as the wave propagation, while smaller spatial frequencies (in the magnitude 
sense) are observed when the direction of the line is not aligned with the 
wave propagation. Suppose we insert B = 2/A into the sampling theorem in 
Lemma 2.8. In that case, it states that we can capture all useful information 
from any signal containing spatial frequencies in the range [—1/A,1/A) by 
taking samples spaced 1/B = \/2 apart. Hence, for any of the considered 
lines, measuring the signal at locations spaced apart by \/2 is sufficient. 
This principle will guide us later when designing antenna arrays. Strictly 
speaking, the spacing between the sampling locations should be smaller than 
\/2, because the cosine signal contains both the spatial frequencies —1/2 
and 1/X. Aliasing might appear when sampling precisely at the Nyquist rate, 
which we will discuss further in Chapter 4. We will also show that an antenna 
array’s ability to distinguish between signals arriving from different directions 
is determined by its ability to separate the spatial frequencies of these signals. 


The DFT can be applied to samples obtained at the same time but at 
different spatial locations. It will then reveal the spatial frequencies present in 
the spatial signal samples. An example of this is shown in Figure 2.35, where 
we take samples from the upper and lower curves in Figure 2.34. The S = 10 
sample points per curve are indicated by circles in that previous figure and 
are spaced apart by \/2, as suggested by the sampling theorem. Since we are 
taking spatial samples, the DFT computes the normalized spatial frequencies. 
Figure 2.35(a) shows the DFT of the blue upper curve in Figure 2.34, which 
contains a wide range of spatial frequencies since the waveform is sampled in a 
dimension that is not aligned with the direction of the propagating waveform. 
Since the original signal is real-valued, there is a symmetry between the 
positive and negative spatial frequencies. Figure 2.35(b) shows the DFT of 
the black signal, which only contains the normalized frequency —1/2 = 1/2. 
A single point in the DFT represents both frequencies due to the aliasing that 
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(b) DFT of the lower curve in Figure 2.34. 


Figure 2.35: The DFTs of the spatially sampled waveforms from the upper and lower curves 
of Figure 2.34. In this case, the DFT describes spatial frequencies, and the figures show the 
magnitudes (i.e., absolute values) since the DFTs can be complex-valued. 


can appear when sampling precisely at the Nyquist rate. However, from the 
preceding discussion, we know that the signal only contains spatial frequencies 
smaller or equal to 1/A; the fastest changes always occur in the direction the 
wave propagates. When combined with the prior knowledge that the signal is 
real-valued, it is possible to reconstruct the original signal in this special case. 
Since the spatial sampling rate is 2/A samples per meter, the true spatial 
frequencies are +44 = +4, as anticipated from the previous discussion. 
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2.9 Exercises 


Exercise 2.1. Consider two orthogonal vectors xi € C™ and x2 € CM. 


(a) What is the projection yproj,x1,x. of another vector y € C™ onto the space spanned 
by xı and x2? Hint: Express the projection as Yproj,x1,x2 = @1X1 + @2X2 and find 
the coefficients a1, a2 E€ C that make the residual vector y — yproj,x,,x. orthogonal 
to xı and x2. 


(b) Generalize the result from (a) to the case where we project y onto the space that 
is spanned by the L < M orthogonal vectors x1,...,x_, E€ C™. Show that we 
can write the projection as Yproj,xj,....x, = Py and obtain an expression for the 
projection matrix P. 


Exercise 2.2. Let x ~ Nc(0, Im) and y ~ Nc(0, R) be M-dimensional complex Gaussian 
random vectors. Moreover, let z = [z1,...,zm|™ be a random vector with independent 
and identically distributed entries zm ~ Exp(1/3) for m = 1,..., M. 


(a) Compute E{|v"y|?} for a given deterministic vector v = [v1,...,um]" € C™. 


(b) Compute E{|v"z|?} for a given deterministic vector v = [u1,...,vm]" € C”. 
(c) Compute Var{||Ax||?} where A € CX*™ is a deterministic matrix with K > M. 
Each column of A has a norm equal to 2 and is orthogonal to the other columns. 
Exercise 2.3. When using PAM, the continuous-time complex-baseband signal can be 
expressed as in (2.120), which we repeat here as 


co 


a= Y a(t] p(t- $). (2.222) 


k=—oo 


The Nyquist criterion says that z(n/B) = Aa[n], where A 4 0 is an arbitrary constant. It 
can be equivalently expressed by multiplying z(t) by the impulse train X372 __—7 0(t—r/B) 
and equating it to the impulse train weighted by the desired symbols: 


z(t) D 5(t- 5) = A PE z[r]ó («- 5). (2.223) 


T=—oo 


(a) By taking the Fourier transform of both sides of (2.223), derive the condition that 
the Fourier transform of the pulse must satisfy 


B SP f—rB)=A (2.224) 


T=—oo 


for the Nyquist criterion to hold. Hint: Use the fact that the Fourier transform of 
the impulse train is given by F{}°* _. O(t-—r/B)}=BYO™~ _. O(f —rB). 

(b) Verify that the sinc pulse is the most bandwidth-efficient pulse that satisfies the 
Nyquist criterion using the condition in (2.224). 


c) Determine whether the Nyquist criterion holds or not for the so-called raised-cosine 
y 
pulse (with roll-off factor 0.5) that has the Fourier transform 


1 if|fI< 3 
POf)= 95 (1 +sin a) aS, (2.225) 
0 if |f| > 32. 
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Exercise 2.4. Consider the LTI system in Figure 2.10(a) with the impulse response 


t-T/2 1, if0<t<T, 
t) = rect = naan 2.226 
I(t) ( T ) {i otherwise. ( ) 


The input signal zp(t) is arbitrary and the complex-baseband equivalent input signal is 
denoted as z(t). 


(a) Find the complex-baseband representation of the output signal vp(t) in terms of 
z(t) and the carrier frequency fe by first filtering the signal in the passband and 
then downshifting vp(t) to the complex baseband. 


(b) Compare the result obtained in (a) with the one obtained by first transforming 
the input signal zp(t) to the complex baseband and then filtering it with the 
equivalent complex-baseband filter from (2.117). 


Exercise 2.5. Consider the noise samples n|l] in (2.123), where w(t) is a white circularly 
symmetric complex Gaussian random process with the constant power-spectral density 
No and p(t) is the sinc-pulse defined in (2.118). 


(a) Prove that the variance of n[l] is No. 


(b) Prove that the noise samples n{J] and n[m] obtained from (2.123) for l #4 m are 
independent. Hint: Use the identity 


J sinc(l — t)sinc(m — t)dt = 0, (2.227) 


which holds for any integers l and m such that l 4 m. 
Exercise 2.6. Consider the linear observation model 
z= Av +n, (2.228) 


where v € C* and n € C™ are independent random vectors. Their entries are in- 
dependent and identically distributed with zero mean and unit variance. The matrix 
A € C™** is deterministic. Hence, the covariance matrices of v and z are E{vv"} = Ix 
and E{zz"} = AA" + Im, respectively. The LMMSE estimate of v based on the 
observation z is 


v=A™(AA"4Iy) z. (2.229) 


(a) Verify the orthogonality principle E{vz"} = E{(v — ~)z"} = 0 for the given 
LMMSE estimator. 

(b) Suppose v ~ Nc(0, Ix) and n ~ Ne(0, Im). Show that the LMMSE estimator 
in (2.229) is also the MMSE estimator by verifying that A” (AA +I) z 
is the mean of the conditional PDF f,\,(v|z). Hint: Use the matrix identity 
det (AA +Im) = det (A"A+Ix) = 


1 ; aoi 
aet((AFAHK)™) and the identity in 


(2.50). 
(c) Find the MMSE estimate of v ~ Nc(0, Im) based on the alternative observation 
y=v+c, (2.230) 


where c ~ Nc(0, C) is the independent noise with an invertible covariance matrix 
C. Hint: Use whitening and then (2.229). 
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Exercise 2.7. Consider a narrowband channel with L paths. The channel response is 
modeled according to (2.131) as 


L 
h= Yo are (2.231) 
i=l 


(a) Is |h| dependent of the value of 7? 


(b) Suppose there are L = 2 paths and ai = a2 = 1. For which values of 71 and 72 is 
|h| maximized? For which values is |h| minimized? 


(c) Define Yi = 27 fe(Ti — 7) and assume that it is a uniformly distributed random 
variable between —r and 7. Compute E{|h|?} assuming that a1,...,az are deter- 
ministic, while ~1,..., wz are mutually independent. Hint: Use that E{e/”*} = 0. 


(d) Redo (c) under the assumption that a1,...,az are also independent random 
variables, uniformly distributed between 0 and 1. 


Exercise 2.8. Consider the complex-valued AWGN channel y = « +n with B samples 
per second. Its capacity is Blog,(1+ P/(BNo)), which follows from (2.146) with 6 = 1. 
Decompose the channel into two real-valued AWGN channels. 


(a) Are the two real-valued AWGN channels independent? 
(b) How many samples per second do we have for each of the two channels? 


(c) Suppose we transmit with a power of P > 0 Watt and place all the power in only 
one of the two real-valued AWGN channels. What is the capacity expressed in 
bits per second? 


(d) Is the result in (c) higher or lower than the capacity of the complex-valued AWGN 
channel? 


Exercise 2.9. A friend claims we can double the capacity (in bit/s) by doubling the 
bandwidth. Is this correct? If yes, use the capacity formula to prove it. If no, explain 
what else needs to be done to achieve twice the capacity. 


Exercise 2.10. The received signal power reduces with the propagation distance d. This 
can be modeled as Y (42)° P using the parametric channel gain model in (1.9), where 
P is the transmit power, a > 1 is the pathloss exponent, and Y > 0 is a constant 
propagation loss. 


(a) Suppose the channel is modeled as in (2.144). How can we select h to get the right 
received signal power? What is the resulting capacity expression? 

(b) Consider B = 10 MHz, No = —174dBm, P = 30 dBm, Y = —37 dB, and a = 3.7. 
What is the SNR for a user at the distance d = 200m? What is the capacity (in 
bit/s)? 


(c) What will the capacity be for a user at the distance 4d? How can the transmit 
power be scaled to achieve the same capacity as in (b)? 


(d) What will the capacity be for a user at the distance d/2? How can the transmit 
power be scaled to achieve the same capacity as in (b)? 
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Exercise 2.11. The capacity of the discrete memoryless channel y = h- x +n is achieved 
by the input signal £ ~ Nc(0, q), as proved in Corollary 2.1. Suppose we instead send two 
independent signals over the channel: x1 ~ Nc(0, q1) and x2 ~ Nc(0, q2). The resulting 
received signal is 


y=h-(a1+%2)+n, (2.232) 


where n ~ Nc(0, No) is independent complex Gaussian noise. What is the corresponding 
channel capacity, which is achieved by selecting qi, g2 to maximize the mutual information 
H(y) — H(y|x1, x2) under the constraint qi + q2 < q? 


Exercise 2.12. Consider a random variable x with zero mean and variance o°. We want 
to estimate g? from the L independent random realizations of x, which are denoted 


v1,...,2z. The following estimator is utilized: 
L 2 
a2 Xai |z:] 
= 2.233 
OL K ( ) 


where K is a pre-determined scalar. 


(a) For which value of K is the considered estimator unbiased? Is the answer dependent 
on the specific distribution of x? 


(b) For which value of K will the considered estimator achieve the minimum MSE? 
Is the answer dependent on more than the mean and variance of x? What is the 
MSE-minimizing value of K if a ~ Nc(0, o°)? 


Exercise 2.13. Consider the binary hypothesis test 


Ho : yff=nfl], 1=1,...,L, (2.234) 
Hı : yf}=14nfl], 1=1,...,L, (2.235) 


where the detector decides whether “1” is transmitted or not by observing multiple 
received signals y[/]. Unlike the hypothesis test in (2.181), L consecutive received signals 
are considered. The real-valued noise samples n|] are independent and identically 
distributed as n[l] ~ N (0,07), for l =1,..., L. 


(a) For a given value of y = Pr{H1}/Pr{Ho}, derive the Bayesian detector that 
minimizes the error probability. What are Pp and Pra for this detector? Hint: 
The answers are integral expressions. 


(b) For a given value of Pra = a, derive the Neyman-Pearson detector that maximizes 
the detection probability, Pp. What is Pp for this detector? 


Exercise 2.14. Consider the continuous-time signal z(u) = 2 cos(2007w) + 3sin(6007u), 
which is sampled to obtain the S = 7-length sequence x[s] = z(s/B), for s =0,...,6. 
What is the DFT of the sequence x[s] when the sampling rate is B = 700 samples/s? 


Exercise 2.15. Prove Parseval’s relation in (2.196) using the unitary property of the 
DFT matrix Fg. 
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Exercise 2.16. Suppose the S-length sequence a = [a[0],...,a[S — 1]]* has the DFT 


a = {a(0],...,a[S — 1]]". Consider the S x S circulant matrix defined similar to (2.215) 
a[0] a[S—1] a[S-2] ... ... all] 
afl] afo] a[S—1] ... a[3] al] 

Ca = : . . ts, an A (2.236) 
as- ss: re 


but for the DFT sequence. We further define a diagonal matrix containing the time- 
domain sequence a: 


af} 0 ... 0 
p.=|° W > ae (2.237) 
hy, ek 0 
0 ... 0 afs—] 


(a) Derive a decomposition of Cg in terms of Da and the DFT matrix Fs similar to 
(2.218). Hint: Switch the roles of the sequences in time and frequency. From the 
structure of the DFT matrix, it holds that F§ = Fs and Ff = F%. 


(b) Consider another S-length sequence b = [b[0],...,6[S — 1]]” which has the DFT 
b = [b[0],..., [5 — 1]]*. Prove that the DFT of the sequence a[k]b[k] (for k = 
0,...,5 —1) is given by (4@6)[v]/v'S (for v = 0,..., S — 1) by using the obtained 
decomposition of Ca and the properties of Fs. Hint: The kth entry of the S-length 
vector Dab is alk — 1]b[k — 1]. 


(c) For the given sequences a[k] = dT, b[k] = dT, for k =0,...,9, verify that the 
DFT of the sequence a[k]b[k] is given by (a @ b)[y]/ V5. 
Exercise 2.17. The signal x(t) = cos(2r fıt) is modulated to the carrier frequency fe by 
computing £p(t) = x(t) cos(27fct), where fe > fi. 
(a) Which positive and negative (temporal) frequencies does xp(t) contain? 


(b) The signal x(t) is radiated from an antenna located in the origin and propagates 
at the speed of light c. Which spatial frequencies can be observed along the y-axis? 


(c) What happens to the temporal and spatial frequencies if the signal propagates 
through a medium where the propagation speed v is smaller than c (i.e., the speed 
of light in free space)? 


Chapter 3 


Capacity of Point-to-Point MIMO Channels 


In this chapter, we will characterize the channel capacity in memoryless point- 
to-point scenarios where one transmitter communicates with one receiver 
without impacting other systems. We will distinguish between four cases: 


1. Single-input single-output (SISO) channel: The transmitter and receiver 
have one antenna each. 


2. Single-input multiple-output (SIMO) channel: The transmitter has one 
antenna and the receiver has multiple antennas. 


3. Multiple-input single-output (MISO) channel: The transmitter has mul- 
tiple antennas and the receiver has one antenna. 


4. Multiple-input multiple-output (MIMO) channel: Both the transmitter 
and receiver have multiple antennas. 


These cases are illustrated in Figure 3.1. The capacity of the SISO channel 
was derived and discussed in Section 2.4.1. This chapter will generalize the 
theory to capture the other three cases, one after the other. The results will 
be utilized in the remainder of the book to study specific communication 
scenarios and channel conditions. 


3.1 Impact of Power and Bandwidth on the Capacity 


Before introducing multiple antennas, we will return to the channel capacity 
for SISO channels in (2.146) and shed some light on how it depends on the 
transmit power P and the bandwidth B. The purpose is to understand how 
the capacity can be improved. For notational convenience, we now explicitly 
write the capacity in (2.146) as a function C(P, B) of these variables: 

PB 


C(P, B) = Blog, (1 + am) bit/s. (3.1) 


3.1. Impact of Power and Bandwidth on the Capacity 151 


Receiver 


Transmitter 2 ? ? 


(a) Point-to-point SISO channel. 


Z l 
—} t. Receiver 
t 


(b) Point-to-point SIMO channel. 


V—_—_—ee ee _eeeee DP 
Transmitter : ? ? . Receiver 
(c) Point-to-point MISO channel. 
Transmitter eo 7 ’ a Receiver 


I i. 


(d) Point-to-point MIMO channel. 


Figure 3.1: The four kinds of point-to-point communication channels where the transmitter 
and receiver have either one or multiple antennas. 
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Since the capacity involves a logarithm, it is useful to notice that 
log.(1+z) ~ loga(e)z if z %0, (3.2) 
loga(1 + z) ~ loga(z) if z>0, (3.3) 


where e ~ 2.71828 is Euler’s number. The expression in (3.2) is the first-order 
Taylor approximation of log,(1 + z) around z = 0. Since log,(1 + z) with the 
SNR z= ENS appears in the capacity expression (3.1), (3.2) and (3.3) will 
help us to understand the capacity behavior at low and high SNR, respectively. 
The notions of low/high SNRs can be interpreted as follows. 


Example 3.1. For which ranges of z > 0 will the approximations of log,(1+ z) 
in (3.2) and (3.3) lead to absolute errors that are smaller than 0.1? 

The low SNR approximation log,(1 + z) ~ logs(e)z in (3.2) is based on a 
first-order Taylor approximation, and it can be written in an exact form as 


2 


aF for some 0 < a < zy (3.4) 


loga(1 + z) = logs(e)z — logs (e) 


where the second term is known as the Lagrange error bound. The absolute 
approximation error can be upper bounded using (3.4) as 


2 
l 
Z < 082(e) ue (3.5) 


leeatT E 108s) arama = Jo 


where the last step follows from setting a = 0 to get the largest possible error. 
Based on this upper bound, the absolute error is smaller than 0.1 when 


1 0.2 
Np Mee ~ 0.37 ~ —4.3dB. (3.6) 
2 loga (e) 


We can find the exact solution by solving log (e)z — loga(1 + z) < 0.1 numeri- 
cally, which results in the somewhat larger range z Ș 0.42 ~ —3.8 dB. 

For the high SNR approximation loga(1+2) ~ logs(z) in (3.3), the absolute 
error is log,(1+ z) —logo(z) = loga(1+1/z). To guarantee log,(1+1/z) < 0.1, 
we should have 


2 ae 


x 13.93 ~ 11.4dB. (3.7) 


When varying the transmit power P, we notice that C (P, B) is a mono- 
tonically increasing function of P. It starts at C (0, B) = 0 and then grows 
linearly with P when the SNR oe is small. We can utilize (3.2) to obtain 


PE iet ye 
BN "NM 
at low SNR, which is independent of the bandwidth. 


C(P, B) ~ Blog(e) 


(3.8) 
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Figure 3.2: The capacity behavior in a single-antenna system when changing the transmit 
power P, for B = 10 MHz and 8/No = 108 Hz/W. 


When the SNR is large, the capacity only grows logarithmically with an 
increasing P due to (3.3). There is no upper limit on how large capacity 
we can achieve by increasing P, but the capacity growth is slow when we 
have reached the logarithmic growth rate at high SNR. Figure 3.2 illustrates 
these behaviors by showing C(P, B) as a function of P for B = 10 MHz and 
B/No = 10° Hz/W. The capacity grows linearly with P in the low SNR region, 
while the logarithmic behavior appears in the high SNR region. 


Example 3.2. Consider the capacity in (3.1) in a scenario where P and B 
have been selected such that P8/(BNọo) = 1. Suppose we change the transmit 
power from P to cP for some scalar c > 0. Which values of c will double and 
quadruple the capacity (compared to c = 1)? 

The capacity in (3.1) becomes C = B log,(1+1) = B under the assumption 
that P8/(BNo) = 1 (i.e., when c = 1). Our first target is to double the capacity 
to 2B by increasing the transmit power to cP. This means that 

cP 


Biog, (14+ o>) = 28 kallta =en 3 so) 


Hence, we need to triple the transmit power to double the capacity. 
Next, we want to find the value of c that gives the capacity 4B: 


cP 


Blogs (1 + BN, 


)=48 © g(1+0=48c=27-1=15. (3.10) 


Hence, we must transmit 15 times more power to quadruple the capacity. 
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Figure 3.3: The capacity behavior in a single-antenna system when changing the bandwidth 
B, for PB/No = 5-108 Hz. 


When varying the bandwidth B, we notice that C(P, B) is a monotonically 
increasing function also of this variable, which can be shown by taking the first 
derivative and proving that it is positive. The capacity starts at C(P,0) = 0, 
which can be shown by taking the limit B > 0. This represents a high SNR 
region where the SNR ae — oo, but the performance is anyway low due to 
the small bandwidth. This also implies that the capacity grows almost linearly 
when increasing B in the high SNR region since the factor in front of the 
logarithm in (3.1) grows linearly. However, the logarithm is almost unaffected 
by a small change in B at high SNR. If we instead consider the case when B 
is large, we can utilize that we operate in the low SNR region where oe is 


small, thus 
PB PB 


oga(e) (3.11) 
One can prove that C(P, B) > logs (e) Ke as B — oo, so there is an upper 
limit on how high capacity we can achieve when having a huge bandwidth. 
The reason is that the fixed transmit power P needs to be divided over the 
bandwidth, leading to a gradually lower SNR when using more bandwidth. 
This is directly seen from the signal energy per symbol q = P/B used in 
Corollary 2.1. Figure 3.3 illustrates these behaviors by showing C(P, B) as a 
function of B for P8/No = 5-10°Hz. The capacity grows linearly with B in 
the high SNR region but converges to an upper limit in the low SNR region. 

With these behaviors in mind, we can conclude how to improve the channel 
capacity most efficiently in different cases. If we have a system that operates 
in the high SNR region, the capacity grows linearly with the bandwidth B but 
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relatively slowly with the power P. Since changes in the bandwidth greatly 
impact the capacity, the high SNR region is called the bandwidth-limited 
region. In contrast, if we have a system that operates in the low SNR region, 
the capacity grows linearly with the power, while the bandwidth has little 
impact. Since changes in the power strongly impact the capacity, the low SNR 
region is called the power-limited region. Alternatively, we can increase both 
P and B while keeping their ratio P/B fixed. In that case, the SNR ie is 
constant, and the capacity (3.1) will always be linearly increasing, irrespective 
of the SNR value. The intuition is that we get more symbols per second, and 
each can carry the same amount of information since we keep the energy per 
symbol constant by increasing the transmit power at the same pace as we 
increase the bandwidth (i.e., the number of symbols per second). For example, 
if we need to double the capacity of a system, we can achieve that using twice 
the power and twice the bandwidth. If the original system operates in the 
power-limited region, we can achieve almost the same capacity gain by only 
doubling the power. On the other hand, if the original system operates in the 
bandwidth-limited region, we can achieve almost the same capacity gain by 
only doubling the bandwidth. However, in general, we need to increase both 
the power and bandwidth to achieve a significant capacity gain. 


3.2 Capacity of SIMO Channels 


We now know how the channel capacity is affected by power and bandwidth. 
To maximize the capacity, the communication systems should be designed to 
use the maximum available transmit power and bandwidth. This is rather 
obvious and has been the standard practice for decades. The purpose of 
multiple antenna communications is to design systems to further enhance the 
capacity without requiring more transmit power and bandwidth resources. 

It is vital to notice that it is not the transmit power P that determines 
the channel capacity but the received power P8. If we want to achieve a 
higher received power, we can increase P. Alternatively, we can use multiple 
receive antennas to capture a larger share of the transmitted power, thereby 
increasing 3. This case will be considered in this section, where the goal is to 
characterize the channel capacity when having multiple receive antennas. 

A channel with one transmit antenna and multiple receive antennas is 
called a SIMO channel; see Figure 3.1(b). We denote the number of receive 
antennas as M. The channel to each receive antenna can be modeled as before, 
using the discrete memoryless channel model in (2.130). However, the channel 
responses will generally differ for every antenna, and the additive noise is 
statistically independent since it is created by randomness in the receiver 
hardware connected to the respective receive antennas. Hence, the received 
signal at the mth receive antenna is given by 


Ymll] = hell] + nml], for m=1,...,M, (3.12) 
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Figure 3.4: A discrete memoryless SIMO channel with the input z[l] and M outputs ym [l] = 
hm2[l]+nm|[l], for m = 1,..., M, where l is a discrete time index, hm is the channel response to 
the mth receive antenna, and nm [l] is the independent Gaussian receiver noise at that antenna. 


where z[/] is the transmitted signal, | is the discrete time index, hm is the 
channel response, and nm |l] ~ Nc(0, No) is the independent receiver noise. 
Note that the transmitted signal is the same for all m, while all other variables 
have an antenna index. A block diagram of this discrete memoryless SIMO 
channel is shown in Figure 3.4. Since this is a memoryless channel, we can 
just as well neglect the time index l and write the channel in (3.12) as 


Ym = hm: +Nm, for m=1,...,M. (3.13) 


Instead of representing the transmission over the SIMO channel using the M 
equations in (3.13), it is convenient to represent the entire system model in 
vector form as 
y=hr+n (3.14) 
by defining the M-dimensional received signal vector y, the channel vector h, 
and the noise vector n: 
yı hı nı 
y= : 3 h = : ; n = : ‘ (3.15) 
YM hm nM 
The geometric relation between these vectors is illustrated in Figure 3.5. The 
received signal vector y is the summation of two vectors: hz and n. The 
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Figure 3.5: The received signal vector y is the summation of the noise vector n and the channel 
vector h that is multiplied by the data signal x. 


former is a vector that points in the same direction as the channel vector h 
but is scaled by the unknown data signal x. The latter is the noise vector 
with independent entries distributed as Mc(0, No). We can express the entire 
distribution as n ~ Nc (0, NoIm) using the multivariate notation introduced in 
Section 2.2.4, where Cov{n} = NoIy, is the covariance matrix. The direction 
n/||n|| of the noise vector is uniformly distributed over all possible directions. 
The word “direction” refers to the geometry in the M-dimensional vector 
space C™ where these vectors reside. There is no simple connection to physical 
directions in our three-dimensional world, but we will return to the physical 
modeling of channels in Chapter 4. 

The receiver wants to detect the data signal x based on the received signal 
y. Since the received signal is a vector and the data signal is a scalar, the 
detection algorithm must somehow include a projection of y onto a scalar 
that we call ĉ. The projection should make ĉ as similar to x as possible, and 
there should be no information loss in the projection. In general, a vector 
projection is carried out by selecting a unit-length vector w and computing 
the inner product = w"y. This scalar represents how far in the direction w 
that y points; that is, ĉ is the (orthogonal) projection of y onto w. The vector 
w is called the receive combining vector when dealing with SIMO channels, 
and it can also be called the detection vector or receive beamforming vector. 

We want to find the capacity of the SIMO channel in (3.14). As a first 
step, we will compute an achievable data rate for an arbitrary w and then 
identify the best projection, which is the one that gives the channel capacity. 
We notice that 


ĉ = wy = whs + w”n, (3.16) 
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where w"h is a scalar and w"n ~ Nc(0, No) is the component of the noise 
that points in the direction of w.! Hence, (3.16) is effectively a memoryless 
SISO channel of the kind in (2.130) with y = ĉ and h = w"h. It then follows 
from Corollary 2.1 that an achievable data rate is 


Hh 2 
logs (1 + eee bit /symbol, (3.17) 
0 


where q = E{|z|?} = P/B denotes the energy per symbol, which we will 
refer to as the symbol power in the remainder of this book. This variable is 
proportional to the transmit power P, so when we later optimize the symbol 
powers of multiple data streams, this is identical to optimizing the transmit 
powers (measured in Watt, i.e., energy per second). 

The value in (3.17) depends on how we select the unit-length vector w. 
Recall that the Cauchy-Schwarz inequality in (2.18) states that 


|w"h|? < |w? Ibl]? = |||? (3.18) 
ee 
=1 
with equality if and only if w and h are parallel. Hence, we can maximize the 
Hypy2 
SNR ane in (3.17) by selecting the unit-length vector 
h 
w= (3.19) 
llh 
that is parallel to h. By inserting (3.19) into (3.17), we obtain the achievable 
data rate 


h 
logs (1+ ar ) bit /symbol. (3.20) 


The receive combining vector in (3.19) is called maximum-ratio combining 
(MRC) since it maximizes the SNR. It has also been called the matched filter 
since the combining vector is effectively matched to the channel. Recall from 
Figure 2.4 that the inner product with a unit-length vector can be interpreted 
as an orthogonal projection onto that vector. In this case, we take the received 
signal vector y and project it onto a unit-length version of the channel vector 
h, as illustrated in Figure 3.6. Since the received signal contains the data 
signal with the form hz, the projection will not remove any part of the data 
signal. The projection will, however, remove the parts of the noise vector 
n that point in other directions than h. This noise suppression approach is 
conceptually similar to the lowpass filtering in Figure 2.13, where the receiver 
removes the noise in the part of the frequency domain where there is no signal. 
In the case of MRC, we instead remove noise from the part of the spatial 
domain where there is no signal. 


1Since wn is the weighted sum of independent complex Gaussian distributed random 
variables, it is also complex Gaussian distributed. Since the mean is zero, the variance is 
computed as Var{w"n} = E{|w"n|?} = w"E{nn"}w = Now"Imyw = No|l|w||? = No. 
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ee 


Figure 3.6: To achieve the SIMO capacity, we should use MRC to project the received signal 
y orthogonally onto the channel vector h. The data-bearing vector ha is unaffected by this 
projection, but the parts of the noise that gave y another direction will be removed. 


In estimation theory, n is called the sufficient statistics for estimating x 
since the projection removes only parts of the independent noise. Since MRC 
is the optimal projection, the achievable data rate in (3.20) is the channel 


capacity of the SIMO channel. 


Corollary 3.1. Consider the discrete memoryless point-to-point SIMO channel 
in Figure 3.4 with the input x € C and output y € C™” given by 


y = he a (3:21) 


where n ~ Nc(0, NoIyz) is independent noise. Suppose the input distribution 
is feasible whenever the symbol power satisfies E{|z|?} < q and h € C™ isa 
constant vector known at the output. The channel capacity is 


= ahi") 
C = log, (1+ N, bit /symbol (3.22) 
0 


and is achieved when the input is distributed as x ~ Nc(0, q). 


When comparing the SIMO capacity expression in (3.22) with the SISO 
capacity in (2.145), we notice that the only difference is the channel gain. It 
is |h|? in the SISO case and has now been replaced by ||h{|? = Z4] |Am|?, 
which is the summation of the individual channel gains to all the M receive 
antennas. Hence, using multiple receive antennas leads to a beamforming gain 
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compared to having a single antenna. For example, if hm = h for m = 1,..., M, 
then ||h||? = M|h|? and the SNR will grow proportionally to the number of 
antennas. This is the beamforming gain introduced in Section 1.2.1, and it 
can be either used to get a larger SNR, or we can reduce q by a factor 1/||h||? 
to get the same SNR as in the single-antenna case using less transmit power. 

As explained in Section 2.4.1, we can express the symbol power as q = P/B 
and multiply the capacity expression in (3.22) by B to change the unit to 
bit/s. This leads to the alternative SIMO channel capacity expression 


P\hi||? : 
= Bl 1 : . 
C oga ( + BN, bit/s (3.23) 


Example 3.3. Consider a SIMO system with M antennas and h = \/B[1 ... 1]. 
What is the capacity Csrmo? Determine the relative capacity gain Csrmo /Csiso 
compared with the capacity Csīso of the corresponding SISO system. 

The capacity of this SIMO system is computed using (3.23) as 


ar h||? PM 
No 


since ||h||? = Z% |Am|? = M£ in this case. The corresponding SISO system 
with M = 1 has the capacity 


PB 
= Bl 14+ — it /s. À 
Cs1so O89 ( T 5) bit/s (3.25) 


The SIMO system provides an M times larger SNR than the SISO system. 
Using the low-SNR approximation in (3.2), the relative capacity gain becomes 


CsIMo Blog,(e Jz ne 
Csiso.—- Blog (e) £ sao 


= M, (3.26) 


which grows linearly with the number of antennas and equals the beamforming 
gain. Using (3.3), the relative capacity gain at high SNR is approximated as 


Csimo 3 Blogs a ae log(M) 
Csiso. Blog (#2) ' ge (fe) 


The relative capacity gain only grows logarithmically at high SNR. The 
absolute difference becomes Csīrmo — Csis0 ~ Blog,(M) at high SNR. 

In summary, since the beamforming gain increases the received power, the 
most significant relative capacity gain is achieved in the power-limited region 
where the SNR is low, while the gain is small in the bandwidth-limited region. 


(3.27) 
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3.2.1 Alternative Combining Vectors 


The derivation of MRC relied on the assumption that w is a unit-length 
vector, but this condition can be relaxed without changing the final result. 
For the matter of argument, suppose we select w = ch for some arbitrary 
scaling factor c 4 0. Substituting this vector into (3.16) yields 


ĉ = wy = ¥ h*h z+ c“h”n : (3.28) 
=[b?  ~Nc(0,|c]? llh]? No) 


which is a SISO channel with h = c*||h||? and noise with the variance 
|c|?||h||?.No. It follows from Corollary 2.1 that an achievable data rate is 


c 2 h 4 h 2 
logs (1 + A HEN = logs (1 + LA) bit /symbol, (3.29) 
which equals the capacity in (3.22). Hence, any combining vector parallel to 
h can be utilized to achieve the capacity. 

In practical implementations, it might be desirable to identify the value of 
c that minimizes the MSE between the transmitted signal x and its estimate 
& = c*||h||?x + c*h"n in (3.28): 


E{|z — #|2} = fa — c*||h|?) — c*h"n J 


(a) an * 2 á *LH 

© Bfl2l?} [1 — c* lhl?’ + E{lc"h"n|?} 

= (1+ |el?|Ihl|* — ellh]|? — c* [[hal|?) + lel? |||? No 
á 4 qNo 


q|\h||* + No||h||?) + ———_., 
( | | oll | ) qllh||2 + No 


oe ae 
q||hl|? + No 


(3.30) 


where (a) follows from utilizing the independence between the signal x and 
the noise n (which both have zero mean), while (b) follows from completing 
the squares with respect to the variable c. Since the first term in (3.30) is 
quadratic, it cannot be negative. Hence, the MSE is minimized by selecting c 
to make the first term equal to zero, which is achieved by c = . This 
results in the alternative MRC vector 


q 
a||b\|?-+No 


w=! a 
qllbl|? + No 


that will simultaneously achieve the capacity and minimize the MSE between 
the transmitted data symbol and the receiver’s estimate ĉ. This is a suitable 
scaling factor since many decoding algorithms use Euclidean distances between 
constellation points and received signals when determining the likelihood of 
different symbols being transmitted, which is aligned with the MSE being 
the average squared Euclidean distance. The capacity can be achieved using 
MRC with any scaling factor c 4 0 because the capacity expression implicitly 
assumes an optimal receiver, which can compensate for any scaling factor. In 
general, any receiver processing that is invertible has no impact on capacity. 


(3.31) 
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Example 3.4. Suppose the received signal is y = hz + n as earlier in this 
section, but the noise is colored in the sense that n ~ Nc(0,C). What is the 
LMMSE estimate of x given the received signal? 

The LMMSE estimator concept was described in Section 2.5.2. It obtains 
an estimate of x through a linear operation: # = wy. Hence, we need to find 
the combining vector w that minimizes the MSE, which can be expressed as 


E {|x —#|?} = E {|e —w"h) — w'n|?} 


© Ef le? }|1 - wh]? + E {|w"n|?} 
=q (1 + w*hh*w — wh — h"w) + w"Cw 
=q + w” (qhh" +C)w-—w" qh — qh" w, (3.32) 
a S Ay 
=B =a =aĦ 


where (a) follows from utilizing that the signal and noise are independent. By 
using the notation a and B introduced in (3.32), we can rewrite the MSE as 


E {|x — ĉ|?} = q + w" Bw — w"a — a"w 
Tr aB AWV B‘a)"B(w — B“'a) (3.33) 
by completing the squares with respect to the vector w. The last term is then 


a quadratic form that attains its minimum value of zero if w = B~!a. We 
can utilize the matrix identity in (2.49) to rewrite the expression as 


i fi =i q il 

w=B ‘a=q(qhh"+C) h A= h. (3.34) 
This vector is called LMMSE combining since it minimizes the MSE. It 
can also be proved to be the capacity-achieving combining scheme for the 
considered channel. LMMSE combining reduces to the MRC vector in (3.31) 
in the special case of C = NoIy. The LMMSE combining terminology is 
usually only used when it differs from conventional MRC; that is, when there 
is colored noise or interference, which we will come across later in the book. 
Otherwise, it is referred to as MRC, as earlier in this section. 


3.3 Capacity of MISO Channels 


We will now consider the opposite scenario of a channel with multiple transmit 
antennas and a single receive antenna, known as a MISO channel; see Fig- 
ure 3.1(c). To emphasize the similarities with the SIMO case considered in the 
previous section, we consider the case when the transmitter and receiver from 
the SIMO channel have exchanged their roles. Hence, we assume there are M 
transmit antennas, and the channel response from the transmit antenna m to 
the receive antenna is denoted by hm. 
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Figure 3.7: A discrete memoryless MISO channel with the inputs £m[l] for m = 1,...,M 
and output y[l] = ue hm&m[l] + n[l], where l is a discrete time index, hm is the channel 


response from transmit antenna m, and n|] is the independent complex Gaussian receiver noise. 


The channel from each transmit antenna to the receive antenna can be 
described by the discrete memoryless channel model in (2.130), but when we 
put it all together, we get the received signal 


yli] = > hm£m[l] + nfl], (3.35) 


where | is the discrete time index, £m[l] is the transmitted signal from antenna 
mM, hm is the channel response from transmit antenna m, and n{l] ~ Nc(0, No) 
is the receiver noise. A block diagram of this discrete memoryless MISO 
channel is shown in Figure 3.7. Notice that there is only a single noise term 
and that the signal contributions h,,,2,,[l] from the different antennas are 
added together (superimposed) by the wireless channel. This makes the setup 
analytically different from the SIMO case. Since (3.35) is a memoryless channel, 
we can just as well neglect the time index and write the channel as 


M 
y= X hmīm +n. (3.36) 


m=1 
To derive the channel capacity, it will be helpful to use the vector notation 
LY hy 
xe] |, bel 2l; (3.37) 


<M hm 
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where x is the signal vector and h is the channel vector. With this notation, 
we can rewrite the system model in (3.36) as 


y =h"x +n. (3.38) 


Two different types of transposes were defined in Section 2.1.1 to be used 
when dealing with complex vectors and matrices: the conventional transpose 
T that flips a matrix over its diagonal and the conjugate transpose ¥ that 
both flips the matrix and replaces each entry with its complex conjugate. 
The conjugate transpose is probably the most common when dealing with 
complex vectors /matrices due to its connection to the inner product and norm. 
Nevertheless, it is a conventional transpose on h in (3.38) because the physical 
channels do not give rise to any complex conjugation.” Recall from (2.17) that 
the inner product between two arbitrary complex-valued vectors a and b of 
the same dimension is computed as a"b using the conjugate transpose. Hence, 
the term h™x in (3.38) is an inner product between h* and x. 

The M-dimensional signal vector x should be selected to send data to the 
receiver. Since the receiver only observes the scalar y, it can only estimate one 
scalar data-bearing signal based on its observation.? Hence, we can, without 
loss of optimality, select the signal vector as 


x = pi, (3.39) 


where p is an M-dimensional unit-length vector and g is the data signal 
having the symbol power E{|z|?} = q. The vector p is called the precoding 
vector or transmit beamforming vector, and the unit-length requirement means 
that the total symbol power of the transmitted signal is 


E{||x||7} = E{|lpll? |z} = Ef 
Sa” 
=1 


2} =4, (3.40) 


independently of how many antennas are used. This effectively means that 
the more transmit antennas are used, the less power is transmitted from each 
one of the antennas. By substituting (3.39) into (3.38), we obtain 


y =h pz +n, (3.41) 


where h*p is a scalar. This scalar is the inner product between the conjugate 
h* of the channel and the precoding vector p. Hence, (3.41) is effectively 


?Many other textbooks on multiple antenna communications, however, write (3.38) as 
y = h™x +n since the use of a conjugate transpose makes the analysis/notation slightly simpler. 
The downside with that approach is that the obtained algorithms cannot be directly applied to 
a practical system, but we must first compensate for the conjugation. 

3It is theoretically possible to send more than one data-bearing signal to a single-antenna 
receiver, but it can be proved that this will not increase the capacity of the system since the 
channel will add these signals together when taking the inner product h™x. 
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Figure 3.8: A MISO channel projects the channel vector h* onto the unit-length precoding 
vector p, and it is |h™p| that determines the SNR. Hence, if the precoding vector p is not 
parallel to the channel vector h*, the SNR will be the same as if a shorter channel vector |h™p|p 
parallel to p was used. 


a memoryless SISO channel of the kind in (2.130) with h = h™p and noise 
variance No. It then follows from Corollary 2.1 that an achievable data rate is 


hT 2 
log. (1+ RC) bit /symbol. (3.42) 
0 


To obtain the channel capacity, it remains to identify the precoding vector 
that maximizes (3.42), which corresponds to maximizing |h™p|?. As in the 
last section, we can utilize the Cauchy-Schwarz inequality from (2.18), which 
states that 

|b*pl? < a"? ll? = |h? (3.43) 

=1 

with equality if and only if h* and p are parallel. Note that ||h*||? = ||h||? 
since the conjugate only changes the phase of the entries, not their magnitudes. 
Hence, we can maximize the SNR ah pl” pr 
vector as 


in (3.42) by selecting the precoding 
h* 

PS iy. 
| 


which is a unit-length vector parallel to h*. This precoding gives the achievable 
data rate 


(3.44) 


h 
log, (1 + alr ) bit/symbol. (3.45) 


The precoding vector in (3.44) is called maximum-ratio transmission (MRT) 
since it maximizes the SNR. It has also been called conjugate beamforming 
since the precoding vector is selected based on the complex conjugate of the 
channel vector. This selection of the precoding vector is intuitive if we look at 
it geometrically as in Figure 3.8: |h™p| is the length of the effective channel 
vector that is obtained when orthogonally projecting h* onto p. This vector 
has only the same length as h* (i.e., same norm) when h* and p are parallel, 
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which is the case with MRT. For the matter of argument, suppose we select 
another precoding vector p that is not parallel to h*. The component of 
this precoding vector that is orthogonal to the conjugate of the channel (in 
the vector space C™) will vanish when taking the inner product hp and 
the corresponding transmit power is lost. In conclusion, MRT is the optimal 
precoding and the channel capacity is the achievable data rate in (3.45). 


Corollary 3.2. Consider the discrete memoryless point-to-point MISO channel 
in Figure 3.7 with the input x € C™ and output y € C given by 


y=h*x+n, (3.46) 


where n ~ Nc(0, No) is independent noise. Suppose the input distribution is 
feasible whenever the symbol power satisfies E{||x||?} < q and h € C™ isa 
constant vector known at the output. The channel capacity is 


= qlib]? 
C = loga | 1+ N bit /symbol (3.47) 
0 


and is achieved when the input is x = TET z with z ~ Nc(0,q). 


Comparing the MISO channel capacity in (3.47) with the capacity expres- 
sion in (3.22) of the corresponding SIMO channel, we notice that these are 
identical. Hence, the benefit of transmitting from M antennas is that the 
channel gain ||h||? = Z% |hm|? becomes the sum of the channel gains of 
the individual antennas. If hm = h for m =1,...,M, then ||h||? = M|h|? and 
the SNR is precisely proportional to the number of antennas. This gain is 
achieved by directing the transmission towards the receiver, as illustrated in 
Figure 1.17 and Figure 1.19. Another similarity is that the capacity-achieving 
combining and precoding vectors, called MRC and MRT, respectively, are 
equal except for a complex conjugate: 


w= př. (3.48) 


In fact, MRT and MRC process the channel vector identically, so the conjugate 
in (3.48) is merely due to notational differences: the combining vector is applied 
as w"h with a conjugate transpose, while the precoding vector is applied as 
h7p without a conjugate so it needs to be placed in p beforehand. 

Even if the channel capacities are equal, there are essential differences 
between the SIMO and MISO channels. When transmitting from M antennas 
to a single-antenna receiver, the transmit power is directed towards that 
receiver, as illustrated in Figure 1.17 and Figure 1.19. MRT basically selects 
the time delays of the different signals to achieve constructive interference 
at the point of the receiver; thus, the radiated signal resembles that of a 
directive transmit antenna but with the critical difference that the directivity 
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is adapted to the channel. The precoding and directivity of the transmission 
will change when the channel changes, which cannot happen when using a 
directive antenna. In contrast, when a single-antenna device transmits to a 
receiver equipped with M antennas, the emitted signal propagates isotropically 
as illustrated in Figure 1.16 (or according to some other fixed antenna gain 
function, such as the one in Figure 1.10). Each receive antenna observes one 
component of the signal in additive noise with variance No. MRC combines 
the signal components constructively, while the noise components are neither 
constructively nor destructively combined, so the resulting noise term wn 
still has variance No. The combining creates a spatially directive reception 
resembling that of a directive receive antenna but with the vital difference 
that the directivity is adapted to the direction of the arriving signal. 


Example 3.5. Is the MRT vector p = Tel unique, or are there capacity- 
achieving alternatives similar to the alternative MRC vectors in Section 3.2.1? 

The precoding vector is selected under the constraint that ||p|| = 1, which 
ensures that the symbol power equals the power of the signal z. This is a 
crucial difference from the selection of combining vectors, which can be scaled 
arbitrarily since the scaling factor affects the signal and noise identically. 
However, there is still some flexibility in the MRT vector. The derivation in 
(3.43) is based on the Cauchy-Schwartz inequality where the maximum value 
is achieved when h* and p are parallel. All the unit-norm vectors that satisfy 
this condition are MRT vectors and can be expressed as p = d where 
the common phase-shift o € [—7,7) can be selected arbitrarily. 


MRT effectively turns a MISO channel into a SISO channel with an 
improved SNR, and the same applies when using MRC in SIMO channels. 
Hence, in practice, the data encoding and decoding can be carried out like in 
SISO systems. For example, Figure 2.18 gave an example of 28 data rates that 
can be achieved by selecting different MCS combinations in 5G NR. When the 
capacity has been computed using the expressions provided in this chapter, 
we can identify the closest smaller data rate in the table and use that MCS. 
The same table can be utilized irrespective of how many antennas are utilized 
or whether it is a SIMO or MISO channel. In fact, a base station can hide the 
fact that it is equipped with multiple antennas from the user devices, which 
has the positive side-effect that one can add beamforming functionalities into 
existing systems without changing the fundamental communication protocols. 

As explained in Section 2.4.1, we can also express the symbol power as 
q = P/B and multiply the capacity expression in (3.47) with B to change the 
unit to bit/s. This leads to the alternative but equivalent way to write the 
capacity of a MISO channel as 


P|ih||? 
BNo 


C = Blog, (1 + ) bit /s. (3.49) 
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Example 3.6. Suppose we would transmit the signal x = p,%, + p2g2, where 
Pi, p2 are two unit-norm precoding vectors and z1, ~ Nc(0,q/2) are 
independent data signals containing half the power. How large data rate can 
we achieve over a MISO channel? Can we achieve the capacity? 

The received signal in (3.38) now becomes 


y= h*(pi%1 ar p272) +n= h”pızı + h* p2Xe +n. (3.50) 


We need to detect the signal zı under the independent additive distortion 
h*p22+n ~ Nc(0, $||h™p2||? +.No) with both interference from z2 and noise. 
Since Z2 is unknown, it is indistinguishable from the noise, and we can achieve 
a data rate similar to (3.42) but by using the noise variance {||h7po||? + No: 


$(lb™pi ||? 
3||h* pall? + No 


Rı = log, (: bit /symbol. (3.51) 


Now when we have decoded the data contained in 21, we know the term h™p, 271 
in (3.50) and can subtract it from the received signal: y—h"p, 71 = h’poZ2+n. 
This residual received signal is of the kind in (3.41), and the data rate that 
we can achieve when extracting the data contained in £2 is 


4 ||hT 2 
Rə = log, (: + seet) bit /symbol. (3.52) 


The total data rate of this system is 
$|[h* pi ||? $||h* pol? 
Ri + Ro =1 14 2 + ] Iah AUE E 
1 + ig 082 ( 1 1jh"pzl|? FM r 10892 ae No 
2 (e + $l|b"p2]|? + No $lh"p2]|? + ™) 
= log2 


zlh"p2l]|? + No No 
— top, (14 a + $l" pell? 
O82 No 
q h 2 4 h 2 h 2 
n ( itl? + gh ) =o Ge | ). eae 
0 0 


where the upper bound is achieved by recalling that MRT with pi = p2 = hay 
has the largest inner product with the conjugate of the channel vector. The 
rate expression in (3.53) coincides with the capacity in (3.47). Hence, we 
have identified an alternative way to achieve the capacity, but it is more 
complicated since we transmit two independent data signals and decode them 
sequentially. Hence, the solution in Corollary 3.2 is preferable. 
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yi ll] 


xg Íl] ym [l] 


Figure 3.9: A discrete memoryless MIMO channel with the inputs x, [I] for k = 1,..., K and 
outputs ym [l] = S hm, reel] + nm[l] for m = 1,..., M, where l is a discrete-time index, 
hm,r is the channel response from transmit antenna k to receive antenna m, and nm [l] is the 
independent complex Gaussian receiver noise at receive antenna m. 


3.4 Capacity of MIMO Channels 


We will conclude this chapter by considering the most general point-to-point 
scenario: the MIMO channel illustrated in Figure 3.1(d). We assume there are 
K transmit antennas and M receive antennas; thus, we need two indices to 
denote each channel response: hm, € C is the channel response from transmit 
antenna k to receive antenna m, for k = 1,...,K and m = 1,...,M. By 
modeling the channel between each transmit antenna and receive antenna 
using the discrete memoryless channel model in (2.130), the received signal 
at antenna m becomes 


K 


Yml] = So hme Pell] + nm[], for m = 1,..., M, (3.54) 
k=1 


where l is the discrete time index, «;[l] is the transmitted signal from antenna k, 
and nm [|l] ~ Nc(0, No) is the receiver noise that is independent across antennas. 
A block diagram of the MIMO channel in (3.54) is shown in Figure 3.9. Note 
that the SISO, SIMO, and MISO channels are all special cases of the MIMO 
channel considered in this section. 


To derive the MIMO channel capacity, we need to utilize all the M received 
signals y[/],..., ym [l] for joint signal detection, which calls for a vector /matrix 
representation of (3.54). If we use the memoryless channel property to drop 
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-EO 


Figure 3.10: A discrete memoryless MIMO channel with vector input x € CK and vector 
output y € C™. The channel is characterized by the M x K channel matrix H and the receiver 
noise vector n € CM , which contains M independent complex Gaussian variables. This block 
diagram is equivalent to the one in Figure 3.9 but uses the vector/matrix notation. 


the time index l, the complete received signal at an arbitrary time instance is 


yı yi hi ktk Ny 
e +l: 
K 
YM Arai RM,ktk nM 
hig dies hi.K LY NY 
= : + (3.55) 
hua... Aux] (TK nm 
This system model can be written in a concise matrix form as 
y=Hx+n (3.56) 
by defining the M x K channel matrix 
hıı aae hik 
H=|: 05. : (3.57) 
hm, hu,K 
and the vectors 
yı Tı ny 
y=|: a PE ee ae MS a (3.58) 
YM TK NM 


Note that the transmitted data signal vector x is K-dimensional since there are 
K transmit antennas, while the received signal vector y and the noise vector n 
are M-dimensional since there are M receive antennas. Since the noise terms 
are independent, the noise vector n has the distribution n ~ Nc(0, NoIm). 
Figure 3.10 shows a block diagram of (3.56) that is equivalent to Figure 3.9 
but uses the matrix/vector notation, which makes it more concise. 

The main goal of this section is to compute the channel’s capacity from x 
to y under a constraint on the maximum symbol power. We let q denote the 
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total symbol power of all antennas, which implies that E{||x||?} = q where 
the mean is computed since the data signal vector x is random. The matrix 
form in (3.56) invites to apply linear algebra results to determine how the 
transmitter and receiver should process their signals. We will use the following 
matrix factorization, called the singular-value decomposition (SVD) [49]. 


Lemma 3.1. Every complex M x K matrix H can be factorized as 
H = UXV", (3.59) 


where U is a unitary M x M matrix containing the eigenvectors of HH", 
V is a unitary K x K matrix containing the eigenvectors of H"H, and & 
is a rectangular M x K diagonal matrix? with the real numbers sı >... > 
Smin(M,K) Z 0 on the diagonal. 


"Unitary matrices are described in Definition 2.4. 
bA rectangular diagonal matrix of size M x K can be viewed as a diagonal matrix of size 
min(M, K) x min(M, K) that has been appended with zeros to become an M x K matrix. 


The SVD can factorize an arbitrary matrix using two specific unitary 
matrices, U and V, whose columns are called the left and right singular vectors. 
The non-negative numbers 81,...,5min(w,K) are assumed to be ordered in 
decreasing order and are called the singular values of H. 


Example 3.7. Compute HH® and H"H using the SVD of H from (3.59). 
How are eigenvalues of HH" and H"H related to the singular values of H? 
We can express HH” using the SVD of H from (3.59) as 


HH” = USV" (USV")" = US V°V,S"U® = USE"U", (3.60) 
= 


=k 


where we utilized that V is a unitary matrix. We notice that XX” is a diagonal 
matrix, thus, UX X"U" fits the eigendecomposition form in Lemma 2.1. Hence, 
U contains the orthonormal eigenvectors of HH” and the M x M diagonal 
matrix ©" contains the real-valued eigenvalues s? >... > Si Mx) Z 9, 
and an additional M — min(M, K) zero-valued eigenvalues if M > min(M, K). 
Similarly, we can express H"H using the SVD of H from (3.59) as 


Hon SUS) Us = Ve UV Ve (3.61) 
Sa 
=Iu 
which we identify as the eigendecomposition of H"H. The unitary matrix V 
contains the orthonormal eigenvectors and the K x K diagonal matrix 47h 


contains the real-valued eigenvalues, which are s? >... > ern m,K) = 0 and 
the additional K — min(M, K) zero eigenvalues if K — min(M, K) > 0. 


172 Capacity of Point-to-Point MIMO Channels 


The SVD can be viewed as a generalization of the conventional eigende- 
composition and can be used to diagonalize any matrix. In contrast, only 
some square matrices can be diagonalized using the eigendecomposition. For 
Hermitian square matrices, the SVD coincides with the eigendecomposition in 
Lemma 2.1, in the sense that U = V contains the eigenvectors and © contains 
the corresponding eigenvalues. 

The last example demonstrates a way to derive the singular values of H: 


1. Compute either HH" or H"H (preferably the one resulting in the 
smallest matrix dimensions) and call it A; 


2. Compute the eigenvalues of A by finding the roots to its characteristic 
polynomial det(A — AI); 


3. Obtain the singular values by taking the square root of the eigenvalues. 


The SVD has the same structure for any matrix but with different values 
in U, X, and V. To derive the MIMO channel capacity, we specifically utilize 
the SVD H = UV" to the channel matrix in (3.57). Suppose the transmitter 
creates its transmit signal as x = Vx for some x, while the receiver processes 
its received signal y by multiplying it with U™ to obtain y = U"y. It then 
follows that 

y = U"Hx+ U”n 
= UTUSV"Vx+ U"n 
= ei, (3.62) 


where we defined n = U"n ~ Nc(0, NoInz) and notice that this “rotated” 
noise vector has the same distribution as n.4 The last equality in (3.62) 
utilizes that U"U = Iņ and V"V =I, for unitary matrices. The proposed 
transmitter and receiver processing is non-destructive, meaning that we can 
get y back by computing Uy (since UU" = Iņ). In contrast, any vector 
x can be expressed as Vx by selecting x = V"x. Hence, there is no loss of 
information when going from (3.56) to (3.62), and the channel capacities must 
be identical. However, (3.62) will be more convenient to analyze since Y is a 
(rectangular) diagonal matrix. 

Let r denote the number of non-zero singular values of X, which is equal to 
the rank of © (and H). This means that sı > 0,...,s, > 0 while the remaining 
singular values are zero. It follows that r < min(M, K) since H has min(M, K) 
singular values. If r < min(M, K), it holds that s,41 =... = Smin(M,K) = 0. 
By utilizing r and the fact that X is a rectangular diagonal matrix, we can 
write (3.62) in scalar form as 


a aia if k= lT (3.63) 


T ) Fig, Phas, no M, 


4This can be proved by computing the covariance matrix of ñ as Cov{n} = Ef{an®} = 
UFE{nn®}U = NoU"ImU = Nola. 
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Transmitter Receiver 
processing processing 


— >to 


(a) The transmitter and receiver processing that diagonalizes the MIMO channel. 


Ý 


Ey —> [| -} >a) 


(b) An equivalent representation with r parallel SISO channels. 


Figure 3.11: By utilizing the SVD H = UNV" of the MIMO channel matrix, the transmitter 
and receiver can process the signals as shown in (a) to achieve r parallel SISO channels as shown 
in (b). The channel response in each parallel channel is a non-zero singular value of H. 


for k = 1,..., M. Notice that the entries in (3.63) can be denoted in vector 
form as follows: y = [j1,...,§m]", X = [41,.-.,%K]", and n= [m1,...,a4]*- 

Interestingly, each of the first r received signals Yy in (3.63) only depends 
on one channel response są obtained from the SVD, one transmitted signal 
parameter zg, and one independent noise variable ng. Hence, we can interpret 
the first row of (3.63) as being r parallel discrete memoryless SISO channels 
useful for communication. The processing that turns the MIMO channel into 
r parallel SISO channels is illustrated in Figure 3.11. 


If M > r, there are M —r additional received signals %,41,..., Ym in (3.63) 
that only contain the independent noise variables n,+41,...,a¢. This happens 
especially when M > K since r cannot be larger than min(M, K) = K; thus, 
the transmitter sends a K-dimensional signal, while the receiver obtains a 
higher-dimensional received signal where the extra dimensions contain no 
signal information. We might also have r < min(M, K) when the channel 
matrix is rank-deficient so that we have fewer than min(M, K) useful parallel 
channels between the transmitter and receiver. The M — r received signals 
in (3.63) that only contain noise are not helpful for communication and are 
disregarded in the remainder of this chapter without loss of optimality. 
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Example 3.8. Consider a discrete memoryless MIMO channel with the channel 
matrix H € C®**. The eigenvalues of the matrix HH® are \; = 3, A» = 2.1, 
A3 = 1.7, and Ay = As = Ag = 0. What is the rank r of H? What are the 
expressions of the r useful parallel SISO channels? 

The singular values of H equals the square roots of the min(M, K) = 4 
largest eigenvalues of HH™. Hence, sı = /Ai = V3, s2 = VA2 = V2.1, 
s3 = VÀ3 = V1.7, and s4 = VX = 0. The rank of H is r = 3 since there are 
three non-zero singular values. In this case, we have r < min(M, K). 

By substituting the three non-zero singular values into (3.63), we obtain the 
following r = 3 parallel SISO channels that can be used for data transmission: 


Jı = V31 +m, J2= V21 +ñ, J3 = V1.7F3 + ñs. (3.64) 


It remains to compute the joint capacity of the r parallel channels in (3.63). 
We know from Corollary 2.1 how to compute the channel capacity of one 
such channel, but we cannot directly use this result to deal with the parallel 
channels in (3.63) since there is one thing that couples them: the transmitter 


has a total symbol power q that it must divide between %1,...,%K, and we 
need to find the optimal way to do this. 
As a first step, we let qi1,...,qK denote the symbol power of each of 


these signals, such that E{|%;|?} = qk. These K power variables must be 
non-negative. It then follows that® 


K K 
q = E{||x||?} = E{||x|?} = D0 Ef|ze?} = D0 ae. (3.65) 
k=1 k=1 


For any given values of q1,...,qK, the maximum data rate is the sum of the 
capacities of the individual channels, each obtained using Corollary 2.1:° 
r qs 
X log, (1 + ie | (3.66) 
k=1 0 
Since this expression only depends on the power variables q,...,q;, the values 
that we assign to q,41,--.,@K for the unused dimensions will not affect the 
data rate. Hence, we can set qr+1 =... = qg = 0 so that all the available 


power can be used for the r parallel SISO channels between the transmitter 
and receiver. The channel capacity of the MIMO channel is obtained by 
maximizing (3.66) with respect to the allocation of power over q,...,@r, 


5Note that ||x||? = x"x = XP V" Vx = xEX = ||X||? since V is a unitary matrix. 

6This step utilizes the fact that the transmitted signals #1,...,Z are independent. Hence, 
the received signals Ņ1,..., Ym are also independent, which is a property that follows from the 
fact that the noise terms 71,..., 7%, are independent in (3.63), so there is no reason to introduce 
any statistical dependence between the parallel channels. More precisely, the differential entropy 
of y is maximized when its entries are independent. 
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under the constraint that the total symbol power is q: 


_ Gk $k, 
C= nod È logs (1 + w), (3.67) 
qk=4d 
k=1 


To obtain the MIMO channel capacity, it remains to derive the capacity- 
achieving values of the power variables. Some power variables might be zero 
at the optimal solution to (3.67). For the sake of argument, suppose we know 
that Ny € {1,...,r} of the variables are non-zero. We can then be sure that 
qı > 0,...,¢n, > 0 and qn,41 =... = qr = 0, because s1,...,5n, are the 
largest singular values.’ In this case, we observe that 


Ny 
deS% dkSk 

l 1 = l 1 
Lie (1+ a dm (0455) 


k=1 


= = Soon (%4 E) 4 Yo, (> mi a) (3.68) 


where only the second term depends on the power variables and is the one 
that should be maximized. This term can be upper bounded by utilizing the 
following classical inequality of arithmetic and geometric means. 


Lemma 3.2. For any set of n real positive numbers 71,...,2,, it holds that 
< 2 Se 3.69 
WME cee Phen, Se = 3 ; 
1 T = Ek ( ) 
k=l 
The equality in (3.69) holds if and only if z1 =... = £n. 


We now apply Lemma 3.2 to the second term in (3.68) to obtain 


Na 7, 
5 logs ( F a) = log, | |] (> F a) 
k=1 Sk 

Na AN, T ; 
= N+ log, | “% TI (F +a) < N4 log, n 3 +4) 

k=1 Sk + k=1 Sk 

N+ 
1 No 

= N, log q4 f (3.70) 


If this was not the case, we would have q, = 0 for some k < N, and qi > 0 for some 
i > N}. Since sk > s;, we can switch the power between q, and q;, thereby getting a higher 
capacity. That is impossible if we start from the power allocation that maximizes (3.67) and, 
hence, we must only use the N largest singular values at the solution. 
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where the last equality follows from the fact that Span dk = q due to the 
constraint in (3.67). The upper bound in (3.70) is independent of the opti- 
mization variables. We can achieve this upper bound if the power variables 
are selected to achieve equality in the inequality of arithmetic and geometric 


means. From Lemma 3.2 we know that this happens if Xe + qk takes the same 


value for k = 1,..., Ny. If we call this common value un, > 0, it follows from 


of + dp = Hy, that we should select the symbol powers to satisfy 


N 
tk = UN, = for k =1,...,N4. (3.71) 
k 


Moreover, the common value must be 


1 No 
HN, = q4 z 
Ni Ok 
N+ 
q 1 No 
= (3.72) 
Na N4 me s? 


since this is the argument of the logarithm on the right-hand side of (3.70). 
We have now determined how to compute the optimal symbol powers if 
we know that exactly N, power values will be non-zero. The remaining issue 
is that the value of N}, is not known in advance. As we increase Ni, we 
maximize an expression in (3.68) with additional terms and power variables. 
This might give the impression that the data rate will increase with N4, 
but we must recall that N equals the number of channels we provide with 
non-zero power. Some SISO channels might have such small singular values 
that it is not helpful to allocate any power to them, even if we can. This can 
be observed from the optimized expression for qg in (3.71), which becomes 
negative for k = N if SN, is so small that No/ SN, is larger than ppy,. We 
should reduce N} when that happens. On other hand, if we select N too 
small, then uy, — = > 0 not only for k € {1,..., Ni} but also for k = N} +1. 


This indicates that we should increase N} to find the solution. 
The final solution is to select the capacity-achieving symbol powers as 


N, 
q = max (u - 2,0), B= 1.57 (3.73) 
Sk 


where we choose the value of u € {u1,..., Hr} that results in X} qk = 4- 
This condition only applies when choosing the value u = uy, that gives 
exactly Ni non-zero powers, while all other options will assign too little or 
too much power. We have now proved the following MIMO capacity. 
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Theorem 3.1. Consider the discrete memoryless point-to-point MIMO channel 
in Figure 3.10 with the input x € C* and output y € C™ given by 


y = Hx+n, (3.74) 


where n ~ Nc(0, NoIaz) is independent noise. Suppose the input distribution 
is feasible whenever the symbol power satisfies E{||x||?} < q. Let H € C’** 
be a constant matrix known at the input and output with r non-zero singular 


values 51,...,5,. The channel capacity is 
t Go ae 
C = 5 log, | 1+ == | _ bit/symbol, (3.75) 
k=l No 
where 
opt No 
G E maX u- -70 P i — ole (3.76) 
k 
and the variable p is selected to make X; 1 qg" = q. 


The capacity is achieved by the input distribution x ~ Nc(0, VQ°"'V"), 
where Qt = diag(qf?",...,q2P*,0,...,0) is a K x K diagonal matrix and V 
contains the ordered right singular vectors of H. 


We have now proved that the transmitter should select the data signal 
x to have a covariance matrix Cov{x} = VQ°?'V™, where V contains the 
right singular vectors of the channel matrix H, as defined in Lemma 3.1. This 
optimal choice diagonalizes the point-to-point MIMO channel into r parallel 
SISO channels with the channel gains s? for k = 1,...,r. Recall that sx is 
the kth singular value of the channel matrix H. The singular values were 
defined in Lemma 3.1 to be in decreasing order, which implies that sı is the 
“strongest” channel and s, is the “weakest” channel with non-zero gain. This 
fact is also reflected in how the transmitter allocates its transmit power over 
the parallel channels. Suppose we know the optimal value of u in (3.76). If 
u— a > 0, then the transmitter allocates the power q??* = u — of to the kth 


parallel channel. Otherwise, it allocates no power to this channel: ae =0. 
Since the singular values are in decreasing order, it follows that 


N N N 
Soe Sas (3.77) 
8] 52 Sr 
and, therefore, 
N, N, N, 
u ee Seas. (3.78) 
3i 2 Sr 


Hence, a capacity-achieving transmitter allocates more power to a channel with 
a stronger gain than a weaker one. It might also put q}?* = 0 to some of the 
weakest channels, even if the channel gain is non-zero. Two properties govern 
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Water = Symbol power q 
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Figure 3.12: The optimal power allocation for a point-to-point MIMO channel can be described 
as filling a tank with a volume of water corresponding to the total symbol power q. The height 
of each segment of the bottom of the tank is inversely proportional to the channel gain. 


the behavior. Logically, stronger channels should be allocated more power 
than weaker channels. However, the capacity expression in (3.75) contains the 


logarithmic function log,(1 + qk x) We recall from Section 3.1 that it grows 


linearly with qp as qt log,(e) when the SNR is small, but then grows at 
a slower and slower pace; therefore, it eventually becomes preferable to also 
allocate power to weaker channels (with smaller sz values) because these can 
initially deliver a linear capacity growth, even if the slope is weaker. 

This optimal power allocation solution is called water-filling since the 
implementation can be illustrated by filling a tank with an uneven bottom 
with water. This is illustrated in Figure 3.12 for the case of r = 4. The 
bottom is divided into four equal-sized segments representing each parallel 
channel. The segment related to channel k has a height of No/ s?, and the 
power allocated to this channel is the water that is above it. When we pour 
water into the tank, it will first be allocated to the strongest channel. We 
continue pouring water until the water volume is q. If q > No/s3 — No/s?, the 
water level will eventually reach a point where also the second channel is used. 
As we continue pouring water into the tank, the first and second channels 
will receive an equal share of the additional water until the point where also 
the third channel is activated. In the example shown in Figure 3.12, the total 
symbol power q is divided over the three strongest channels, while the fourth 
channel is not used, although it has a non-zero channel gain (i.e., the height 
of the bottom is finite). 
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Figure 3.13: The rates achieved by the two parallel channels from Example 3.9 when optimal 
water-filling power allocation is used. When the weaker channel 2 begins to be used, it contributes 
equally much to the capacity growth as the stronger channel 1. 


Example 3.9. Consider a MIMO channel with r = 2, s?/No = 1, and s3/No = 
1/4. How is the transmit power allocated when using water-filling? 

According to the water-filling expression in (3.76), we will select q9™* > 0 if 
u — s3/No = u — 4 > 0, which implies that the water level must be u > 4. By 
contrast, for u € [1,4], we assign all power to the strongest channel, resulting 
in q9™* = u — s?/No =  — 1 € [0,3]. In the range u > 4 where both channels 
are used, they contribute equally to the capacity growth because 


opt 2 2 2 
dk Sk l No\ sk \ _ s% 
log, (: F No = log, (1 (u 2) ‘a = logs(u) + logs (5) 


(3.79) 


increases with u in the same way regardless of the index k. 

Figure 3.13 illustrates this behavior as a function of the total symbol power 
q, which is normalized in the sense of being dimensionless in this example. We 
notice that the rate of channel 1 grows rapidly in the beginning. However, for 
q > 3, we allocate the additional power q — 3 equally among the two channels, 
and this results in rate curves for the two channels that grow equally fast. 


The two extreme cases of the water-filling power allocation are illustrated 
in Figure 3.14. If the symbol power is low, only the strongest channel will 
be used, as shown in Figure 3.14(a). If the power is high, the total symbol 
power will be allocated over all the r parallel channels. The stronger channels 
are always allocated more power than the weaker channels, but the relative 
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(a) Low power. (b) High power. 


Figure 3.14: Illustration of the water-filling power allocation at low and high power: (a) Only 
the strongest channel is used when the power is low. (b) All channels are used when the power 
is high, and the power allocation becomes almost equal. 


difference gradually disappears. In fact, we get an asymptotically equal power 
allocation of q/r per channel as q — co. Notice that when we say “high” or 
“low” power in this context, it typically means that the SNR is high or low. 
As mentioned earlier, it is the fact that the logarithm grows slowly at higher 
SNRs that motivates the water-filling power allocation to use more than one 
channel when the strongest channel has reached a high SNR. 

The variable r is called the multiplexing gain of the point-to-point MIMO 
channel since it represents the number of parallel data streams the channel 
supports with non-zero channel gain. This is an important performance 
indicator when the water-filling power allocation assigns non-zero power to 
all the r channels (e.g., at high SNR) because then the MIMO capacity is 
roughly r times larger than the capacity of a corresponding SISO channel. 

To demonstrate how the multiplexing gain can greatly increase the capacity, 
we will compare a SISO channel with |h|? = 1 with a SIMO/MISO channel 
with ||h||? = M and a MIMO channel with M = K in which all entries of the 
channel matrix H also have unit magnitude. The singular values of this MIMO 
channel will satisfy 77“, s2 = tr(H"H) = MK = M?, but their individual 
values will vary depending on how we select the phases of the individual 
entries in H. Let us consider an “ideal” MIMO channel where all singular 
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Figure 3.15: The capacity in the MIMO, SIMO/MISO, and SISO cases over an ideal channel 
where all entries have unit magnitude and all singular values of H are equal. The MIMO capacity 
is M loga(1 + SNR), the SIMO/MISO capacity is loga(1 + MSNR), and the SISO capacity is 
loga (1 + SNR). 


=———e 


values are equal: sı =... = sm = V M.S The MIMO capacity in (3.75) then 
becomes 
g's q 
= l 14+4—)=M1 1+ — : 
C 2 oga [1+ “Se 08> ( + x) (3.80) 


since r = M, s? = M, and equal power allocation a = q/M is optimal. The 
value in (3.80) is exactly M times larger than the corresponding SISO capacity 
log,(1+ #5) in (2.145). Moreover, the SIMO/MISO capacity is log,(1+ Mz) 
in this example. The key difference from (3.80) is that the factor M appears 
inside the logarithm instead of in front of the logarithm. This makes a huge 
difference when the SNR is large; the multiplexing gain is greatly preferred 
over a beamforming gain since the capacity grows linearly with M instead of 
logarithmically. The multiplexing gain is also called the pre-log factor since it 
appears in front of the logarithm in the capacity expression. 

We show the capacities in Figure 3.15 as a function of SNR = N for 
r = M = 4. Note that the SNR is shown in the decibel scale. The lowest curve 
is the SISO case, which represents the baseline performance. The SIMO/MISO 
case gives a curve with the same shape as in the SISO case, but it is shifted 
to the left by 6dB, due to the beamforming gain of M = 4. The MIMO 
case gives the same capacity as the SIMO/MISO case at low SNR. (when the 


8Equal singular values can be achieved by letting H be a unitary matrix that is scaled by a 
factor VM, which leads to an SVD with © = VMIm. Two concrete examples are when H is a 
Hadamard matrix or a properly scaled discrete Fourier transform matrix. Section 4.4.3 describes 
a way to deploy practical antenna arrays to achieve equal singular values. 
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logarithm is approximately a linear function), but then it grows much faster 
with the SNR thanks to the multiplexing gain. More precisely, the slope of 
the curve is r = 4 times steeper; therefore, the performance gain of having a 
MIMO channel becomes larger the higher the SNR becomes. 


Example 3.10. Consider a point-to-point MIMO channel where the channel 
matrix has the singular values: sı = 1, sg = + $3 = 3, and s4 = i The 
optimal water-filling power allocation is used. 

(a) If q/No = 2, what is the optimal power allocation? 

(b) For which values of q/No are all singular values assigned non-zero power? 
(c) If g/No = 434, what is the optimal water-filling power allocation? 

(a) We can notice from Figure 3.12 that only the strongest channel sı is 
utilized if q < No/s2 — No/s? = 2?No — No = 3No. This is the case when 
q = 2No, thus the power allocation is q} = q = 2No and q2 = q3 = q4 = O. 

(b) All the parallel SISO channels are allocated non-zero power when the 
water height u is above the height of the fourth segment in Figure 3.12. The 
breaking point occurs at u = No/s? = 16No, in which case the total power is 


1 
37 


= a (+ = Dp = 4u — No — 4No — 9No — 16No = 34No. (3.81) 


Hence, the four singular values are assigned non-zero power when q/No > 34. 

(c) All the channels are utilized since q/No = 434 > 34. After filling the 
tank with the water volume 34No, the remaining 434No — 34No is divided 
equally among the four channels. An additional 100Npo of water is added to 
each segment, resulting in the new water height 4 = 116No. The optimal 
power allocation is a= pL = No = 115.No, q2 = u = 4No = 112.No, q3 = 
L — 9No = 107No, and qa = u — 16Nọ = 100No. This allocation is almost 
equal, which is expected when the transmit power is high. 


In (3.80) and the last example, we assumed the MIMO channel matrix has 
the full rank min(M, K). The multiplexing gain r is generally upper bounded 
as r < min(M, K). Hence, there is no need to transmit more parallel data 
streams than the minimum of the number of transmit and receive antennas. 
This explains why only one data stream was sent over the SIMO and MISO 
channels we considered earlier in this chapter. In some cases, r is strictly 
smaller than min(M, K), so we have a lower multiplexing gain than in the 
ideal case. If the singular values are very different, we need a huge SNR 
before the water-filling power allocation uses all r channels. It is only then 
that the entire multiplexing gain is helpful in practice. For a given channel 
matrix and power level, the effective multiplexing gain N4 (i.e., the number 
of non-zero power variables) is more indicative of the multiplexing behavior. 
Even if N, = 1, having multiple antennas on both sides of the channel is 
beneficial because the singular value sı grows the more antennas are used. 
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Example 3.11. Consider a point-to-point MIMO system with the total symbol 
power q, noise variance is No, and the channel matrix 


H = 


i il 
| l (3.82) 
What is the channel capacity Cyrmo? Compare it with the MISO channel 
capacity Chqrs9 obtained when only one of the receive antennas is used. 

We begin by computing the singular values of H, which are the square 
roots of the eigenvalues of 


a= ge gpa C 


The eigenvalues can be obtained by solving characteristic polynomial equation 


0 = det (HH" — A1») = det (|? 3 * a =(2—)?-4, (3.84) 


from which we obtain A; = 4 and Az = 0. Hence, the singular values of H are 
sı = V4 = 2 and sz = 0. The rank is r = 1, which is also the multiplexing 
gain. Since there is only one non-zero singular value, assigning all power to it 
is optimal: qı = q and q2 = 0. This results in the MIMO channel capacity 


4 
Cmmo = log, (1 + 4) bit /symbol. (3.85) 
0 


When the receiver only uses a single antenna, we obtain a MISO channel 
with the channel vector h = [1,1]" being one of the columns of H. The 
corresponding MISO channel capacity is obtained from (3.47) as 


2q 
No 


The MIMO channel obtains a beamforming gain of Mk = 4, while the MISO 
channel only achieves a beamforming gain of K = 2. Hence, the MIMO 
channel has a distinct benefit even when the multiplexing gain is r = 1. 


Curso = logs (1 + hp?) = log (1 + ) bit /symbol. (3.86) 
0 


We will now take a closer look at the water-filling power allocation. The 
variable u represents the water level in Figure 3.12. Recall that this variable 
equals uy, in (3.72) for some N4 € {1,...,r}. For each potential value of N, 
we can verify if = un, indeed gives N4 non-zero powers in (3.76); nothing 
more and nothing less. This implies that we must have wy, — a > 0 and 

+ 


UN} — z < 0. Only one value of N, satisfies both conditions because the 
N, +1 


+ 
water level is always between two consecutive segments in Figure 3.12. Hence, 
the recipe for computing the optimal water level is as follows. 
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Corollary 3.3. The optimal water level is 


H1, iu <0, 
if = oS) and ee) 
a ea aes E ee (3.87) 
for Ny € {2,...,r—1}, 


Lr, if pp — > 0, 


where un, is given in (3.72) for Ni € {1,...,r}. 


Since only one of the r possible values of u in Corollary 3.3 has conditions 
that hold, one way to implement the water-filling power allocation is to start 
with computing u, and check if the condition in (3.87) holds. If not, we 
compute ur—ı and check if its condition holds. We continue until we find one 
u for which the conditions in (3.87) hold, and this is the optimal solution. 


Example 3.12. Consider a point-to-point MIMO channel with the r = 7 non- 
zero singular values sı = 1, s2 = Te s3 = F s4 = T $5 = = aS T 


and s7 = Je What is the water-filling power allocation if q/No = 23? 

We must identify the optimal water level to find the capacity-achieving 
power allocation. Corollary 3.3 provides the options u1,..., 47, along with 
their respective optimality conditions. We begin by computing u7 using (3.72): 


N 1 
br = 4 (1434546474 10416) = “Ny, (3.88) 


We notice that u7 — = = T No — 16Np # 0, thus, the condition in (3.87) is 
not satisfied. We a computing ue using (3.72), which results in 


N 55 
lig = = 4 E er lO) oi (3.89) 


We notice that pe — 4 = 55 No — 16No < 0 but pg — 42 = Š No —10No Æ 0, 
T 6 
so jig is not satisfying its optimality conditions in (3.87). Next, we compute 


G. a O E A (3.90) 

We note that u5 — = 9No — 10 No < 0 and ps — F = 9No — 7No > O, 

hence, the optimality comes in Corollary 3.3 are satisfied. Since only one 
water level satisfies its conditions, there is no need to consider j14,..., [a. 

In conclusion, Ny = 5, and us = 9No is the optimal water level. Substi- 

tuting these values into (3.76), we obtain the water- filling power allocation 

Go =N e = N GR — he = N e = I a ST. 


3.4. Capacity of MIMO Channels 185 


In practical systems, we cannot operate at arbitrary capacity values but 
only those achievable by predefined MCS combinations; for example, those 
listed in Table 2.18 for 5G NR. For stream k € {1,...,7r}, we should select an 
MCS delivering a number of bits per symbol that is close to but smaller than 
the capacity loga(1 + aksh) of that stream. The r streams will generally use 
different MCSs. The water-filling solution is not optimal when considering 
this mapping between the continuous channel capacity and the discrete set of 
data rates supported by the available MCS combinations. In particular, one 
can sometimes modify the power allocation to push some streams to the next 
row in the table (i.e., achieve a larger data rate) without reducing the other 
ones. This principle is called mercury/water-filling and is described in [50]. 


3.4.1 Geometric Interpretation of MIMO Transmission 


We will now provide a basic physical interpretation of how we achieve the 
MIMO capacity. Let us write the K x K matrix V from the SVD of the 
channel matrix as V = [v1,..., Vg], where vz is the kth column. To achieve 
the capacity, the transmitter sends the signal vector 


K 
x= VK=) vats, (3.91) 

k=1 
which consists of K data signals %1,...,ZK, each being multiplied by a column 


vy from V that acts as a precoding vector. This is a generalization of the 
MISO setup in (3.39) where we only sent one data signal multiplied by 
one precoding vector. We call this type of transmission spatial multiplexing 
since we send (up to) K signals simultaneously, but with different spatial 
directivity determined by the precoding vectors. These vectors are mutually 
orthogonal since V"V = Ix but might be assigned different symbol powers 
since Z ~ Nc(0, qk) for k = 1,..., K. We call V the precoding matriz. 
Similarly, let us write the M x M matrix U from the SVD as U = 
[uy,...,Ugz], where um is the mth column. When the receiver computes 
y = U'’y, it obtains 
uly 
y= l , (3.92) 
umy 
which can be interpreted as applying M different receive combining vectors, in 
the same way as we did with one combining vector in the SIMO case in (3.16). 
The receive combining vectors are mutually orthogonal since U"U = Iw. 
Since the precoding vectors v1,..., Vg and combining vectors u1,..., Um are 
selected based on the SVD of the channel matrix, it follows that 


K 


uny = Un (ne = Smin T lan, M= bost. (3.93) 
k=1 
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This is the first row in (3.63). The precoding and combining vectors Vm, Um 
for m > r are not used since no data signal can reach the receiver when using 
those because the corresponding singular values are zero. 

The vectors u,,...,U ,¢ are called the left singular vectors of H, while 
V1,---,VxK are called the right singular vectors. Using this notation, we can 
decompose the MIMO channel matrix as 


H = USV" = So scugvg, (3.94) 
k=1 


in the same way as we did for the eigendecomposition in (2.40). The above 
decomposition can be verified by directly computing the matrix entries on the 
right-hand side. Hence, the channel matrix consists of r components, which 
might represent different propagation paths. Figure 3.16 provides a rough 
geometric interpretation for the case with K = 3 transmit antennas, M = 3 
receive antennas, and r = 3. In this figure, each of the three components 
is represented by one physical propagation path, which either is the direct 
path between the transmitter and receiver, or a path where the transmitted 
signal bounces off a scattering object before reaching the receiver. The channel 
responses of the respective three paths are s1, 52,83, which are the singular 
values of the channel matrix. To achieve the MIMO capacity, the transmitter 
should precode its signals to transmit along the three beams indicated in 
Figure 3.16. The receiver “listens” to the corresponding signals by applying the 
corresponding receive combining vectors u1, U2, Uz. The water-filling power 
allocation determines how much of the total power is assigned to each path. 

This example exposes another critical difference between having multiple 
antennas at the transmitter/receiver versus having a single directive antenna 
at the transmitter/receiver. A directive antenna can only transmit/receive 
with one directivity, while multiple beams with different directivity (each 
adapted to the MIMO channel) are needed to achieve a multiplexing gain. 

It is important to note that a direct mapping between precoding/combining 
vectors and physical propagation paths, as sketched in Figure 3.16, is not 
possible in general. It mainly happens when a few propagation paths (compared 
to the number of antennas) are spread out spatially. In all other cases, each 
component Skup VĚ in (3.94) represents some complicated linear combination 
of many different propagation paths, which happen to lead to an orthogonal 
transmission. Hence, when talking about the spatial directivity of an M- 
dimensional precoding/combining vector, this should not be interpreted as a 
distinct angular direction in our three-dimensional world but as the direction 
of a vector in an M-dimensional vector space. We will return to this matter in 
later chapters when we study MIMO channels in different deployment scenarios 
and identify ways to generate the channel matrices from the geometry of the 
propagation environment. 
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Channel matrix H 


Figure 3.16: The SVD divides the channel matrix H into r paths of the form SkUkVĘ, where 
st describes the channel gain of the kth path, v describes the spatial direction of the path 
seen from the transmitter, and uz describes the spatial direction seen from the receiver. 


Example 3.13. Consider a MIMO channel matrix that is decomposed as 


H = 3a,b" + agbž, (3.95) 


saje eo 


What is the multiplexing gain of this channel? What are the channel gains of 
the SISO channels through which parallel data streams can be sent? 

We first notice that a; and ag are orthogonal since ařa = 0. Moreover, 
bı and by are orthogonal since bi bz = 0. Hence, the given decomposition 
can be used to obtain the SVD as in (3.94). Recalling that the left and right 
singular vectors have unit norms, we can obtain them as 


where 


ay a2 bı be 
uUi eee U2 = Vi = V2 = 5 
to ap P Ja t [fbi]? ©? [bal 


(3.97) 


Accordingly, the singular values are computed as sı = 3||aj||||bi|] = 3v5 and 
s2 = ||a2ļ|||b2|| = v10. The multiplexing gain is r = 2 since the rank of H is 
two. The channel gains of the parallel SISO channels are s? = 45 and s3 = 10. 
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3.4.2 Duality and Alternative Capacity Expressions 


The channel capacity of the MIMO channel is determined by the total symbol 
power q, the singular values of H, and the noise variance No. Suppose we 
would transmit with power q in the opposite direction; that is, over the MIMO 
channel HT from M transmit antennas to K receive antennas. The SVD 
of this channel matrix is HT = (UNV")" = V*XTUT. Since X7 has the 
same diagonal values as X, the singular values of H and HT coincide and the 
water-filling power allocation will be identical if No is also unchanged. We 
have obtained the following result. 


Corollary 3.4. The capacity of the MIMO channel with channel matrix H is 
the same as the capacity of the MIMO channel with channel matrix H” if 
the transmit power to noise power ratio is the same. 


This corollary establishes a strong connection between a primal system 
with channel matrix H and a dual system with channel matrix H”. The 
fact that the capacity is the same in both directions of a communication 
channel is called duality. One instance of the duality is that SIMO and MISO 
channels have the same capacity, as we previously observed in this chapter. 
Duality might not be achieved in practice because different devices might have 
different transmit power (recall the comparison between uplink and downlink 
in Figure 1.7) and noise power due to different hardware characteristics. 


Example 3.14. What is the capacity-achieving input distribution for the dual 
system with channel matrix H*™? Assume q and No remain the same. 

The optimal input distribution is x ~ Nc(0, VQ°?'V") for the primal 
system, which depends on the right singular vectors V of H € C™** and the 
K x K power allocation matrix Q°P* = diag(q9™*, ... , q¢2?t,0,...,0) computed 
using Theorem 3.1. The SVD of the dual channel matrix is HT = V* TU", 
which instead has the matrix U* € CM*M containing its right singular 
vectors. Even if the water-filling power allocation is the same in the primal 
and dual systems, the number of transmit antennas might differ, so the 
power allocation matrix for the dual system must be defined differently: 
Qt = diag(qi?*,...,¢2P*,0,...,0) is an M x M diagonal matrix. The first 
r diagonal entries are the same as in Q°P' but are proceeded by M — r 
zero-valued entries. 

In conclusion, the capacity-achieving input distribution for the dual system 
is x ~ Nc(0, U*Q*U?), 


There are several alternative ways to express the MIMO capacity. Recall 
that we can multiply the capacity expression in (3.75) with B to change the 
unit to bit/s. This leads to the alternative but equivalent way to write the 
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capacity of a MIMO channel as 
P qs? 
C= Blo bp bit/s. 3.98 
> 82 No / ( ) 
By substituting the expression for q??* in (3.76) into (3.98), we obtain 
id ps? 
C=)" max (2 logs (5) 0) bit/s. (3.99) 
k=1 No 


The capacity expression in (3.75) can also be rewritten using the determi- 
nant in the following way: 


r qt s2 T ge? s2 
C = X logs 1+ = logs II 1+ 


k=1 k=1 
1 
= log, («ct (tu + zes") ) 
No 
1 
= log, (aet (tu + x Hvav"n")) , (3.100) 
0 


where the last step follows from some matrix algebra that exploits the fact that 
U is a unitary matrix.? We notice that VQ°?'V® in (3.100) is the covariance 
matrix of the transmitted signal in Theorem 3.1. It is a quadratic form 
containing the optimal precoding matrix V and the optimal diagonal power 
allocation matrix Qt. Moreover, the matrix inside the determinant in (3.100) 
is the covariance matrix of the received signal y, divided by the noise variance 
No, so we can also express the MIMO capacity as C = logs (det (y-Cov{y})). 
Hence, the capacity-achieving transmission over a MIMO channel is the one 
that maximizes the determinant of the received signal’s covariance matrix. 


3.4.3 Arbitrary Precoding and Successive Interference Cancellation 


There are various reasons for not transmitting in a capacity-achieving way in 
practice, such as having imperfect channel knowledge at the transmitter or 
limited hardware capabilities. We will return to such issues in later chapters 
but cover the fundamental theory here. Recall that the capacity-achieving 
precoding creates many parallel SISO channels, as illustrated in Figure 3.11(b). 
This will not happen when suboptimal precoding is used to transmit multiple 
data streams; thus, the spatially multiplexed signals partially collide at the 
receiver and must be appropriately decoded to deal with mutual interference. 


For any M x M square matrices A and B, it holds that det(AB) = det(A) det(B). We 
can utilize this property to achieve the last equality in (3.100): det(Iyg + Np QP E") = 
det(UĦU + 4- UĦHVQPtVY HEU) = det(U") det (Im + 3 HVQ°?'V"H) det(U), where 
det(UĦ) = det(U) = 1 since these are unitary matrices. 
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Suppose the transmitted signal is generated as 


K 
x =P= > ppt (3.101) 
k=1 
using an arbitrary precoding matrix P = [p1,...,px] € C*** with unit-norm 
columns px to send the K independent data signals from X = [%1,...,%xK]* 


in different spatial directions. As the complex Gaussian input distribution is 
capacity-achieving, we assume that x ~ Nc(0, Q), where Q = diag(q,..., qK) 
is an arbitrary diagonal power allocation matrix satisfying the power constraint 
D qk < q. This is a feasible way to communicate over a MIMO channel, 
but it is suboptimal unless we select P = V and Q = Q°". Before deriving 
the achievable rate with an arbitrary P, we begin with a helpful example. 


Example 3.15. Suppose we use a fixed precoding vector p € C* to transmit 
a data signal z ~ Nc(0, q) over a MIMO channel so that the received signal is 


y =Hpz+n, (3.102) 


where the noise has the colored distribution n ~ Nc(0,NoC) for some 
invertible covariance matrix C’*™. What is the capacity of this channel? 
With a fixed precoding vector, the MIMO channel effectively becomes a 
SIMO channel with the channel vector Hp. An unusual property is that the 
noise is colored since C is generally not an identity matrix. This can be dealt 
with using the whitening procedure in (2.86), by transforming (3.102) as 


=i 2 a = 
CUPy=CUPHps+ Cn . (3.103) 

~Nc(0,NoIm) 
The whitening operation is reversible (i.e., it causes no information loss), so 
we can use (3.103) to compute the capacity. We now have a received signal 


with white noise as in Corollary 3.1 and with the effective SIMO channel 
vector C~!/?Hp. The capacity with the fixed precoding becomes 


C-1/2H 2 
z h _ aC 


q HEFHO- 1 
= |l 1+ —p*H Hp}. 104 


This capacity is achieved when applying MRC to (3.103) based on the effective 
SIMO channel vector. Hence, we need to apply the combining vector 
w = C1/?C-1/*Hp = C"1Hp (3.105) 


to the original received signal in (3.102) to first perform whitening and then 
MRC. This vector is equal to LMMSE combining vector in Example 3.4, 
except for a scaling factor, so we will use that terminology in this section. 
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The transmitted signal x in (3.102) has the (suboptimal) covariance matrix 
PQP" and the corresponding data rate is 


1 
logs (aet (11 + a HPQP"H")) bit /symbol, (3.106) 
0 


which naturally is smaller than the capacity in (3.100). We will prove that 
(3.106) is an achievable rate by expanding the expression until we reach a 
familiar form, which reveals how the receiver can operate to achieve this rate. 
The rate expression from the last example will be useful in the derivation. 


With arbitrary precoding, the received signal in (3.56) can be expressed as 


y = Hx+n = HPx +n 
K 
k=1 


and its covariance matrix is Sar qkHpkpŁH” + Nol. Each signal appears 
at the receiver in a unique direction Hp, in the M-dimensional vector space, 
and the K directions might be linearly independent (if K < M), but generally 
not mutually orthogonal. Therefore, the signals are interfering with each 
other, which we can deal with by decoding them sequentially and successively 
removing the already decoded signals—known signals cease to be interference. 
For notational convenience, we define 


i-1 
yi=y- > pee i=1,...,K+1 (3.108) 
k=1 


as the residual received signal when the first i — 1 data signals have been 
decoded and removed. This vector has the covariance matrix NoC;, where 


K 
Im + © 4 Hp.pyH", ift=1,...,K, 
k=i 


Ci = (3.109) 


Im, ifi= K +1. 


The data rate in (3.106) can be rewritten using this notation as 


K 
logs (aet (tu + yo H )) = logs [a (1 + > No EePaPE )) 


k=1 
= logs (det (C1)) (3.110) 


by utilizing (3.109) and the fact that the signal covariance matrix can be 
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expanded as PQP* = > = dkPxp,- We can further rewrite (3.110) as 


log, (det (Ci)) = logs («ct (c + d Hpi!) ) 
0 


= log, («ct (c + “Hp pjH"Cy'Ca) ) 
0 


= log. («ct (tu + 4 Hpipin"cz?)) + logs (det (C2)) 
0 
= logy (1 + 2 piHt"C; "Hp: ) + logs (det (C2)), (3.111) 
0 


where the last equality follows from Sylvester’s determinant theorem in (2.53). 
The first term in (3.111) has the same structure as the capacity expression in 
Example 3.15; that is, it is the capacity when transmitting using the precoding 
vector pı and having colored complex Gaussian noise with the covariance 
matrix NoC2. This is precisely how the received signal in (3.107) is structured 
if we decompose it as 


K 
y =Hp,z + X Hp, z, +n. (3.112) 
k=2 
N. 


~Nc(0,NoC2) 


The latter term is not conventional noise since it contains both interfering 
signals and receiver noise. However, from the decoding perspective, it is 
distributed as colored complex Gaussian noise, so it takes the role of an 
effective noise term. Hence, if we decode the data signal zı while treating the 
remaining K — 1 interfering signals as part of the noise, then we can achieve 
a data rate equal to the first term in (3.111) using LMMSE combining of the 
kind described in Example 3.15. 

The second term log,(det(Cz2)) in (3.111) can be expanded similarly. In 
fact, for any i = 1,..., K, it holds that 


log, (det (C;)) = logs (aet (Cis: + 4 Hp.p!H") ) 
0 
= log, (aet (tu + E Hpipr "Ch )) + log, (det (Ci41)) 
= logs (1 + 4 piH'C7 Hp) + logs (det (C;+1)) , (3.113) 


where the first term is the capacity when transmitting using the precoding 
vector p; and having colored complex Gaussian noise with the covariance 
matrix NoC;41. This is how the residual received signal in (3.108) is structured 
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because it can be decomposed as 


K 
yi = Hpizi+ S> Hp. +n. (3.114) 
k=i+1 
~Nc(0,NoCi+1) 


The first term in (3.113) is, therefore, a data rate we can achieve by removing 
the first i—1 data signals from y to obtain y; and then decode g; while treating 
the remaining K —i interfering signals as part of the colored noise. The iterative 
expansion in (3.113) terminates when i = K since log, (det (Cx +1)) = 0. 

In summary, the data rate in (3.106) can be expanded as 


K 
1 qi = 
lo («ct (1 +; ;HPOP'H") ) =X lo (+i SHC) i) 
82 M No > 82 NP 4144p 
(3.115) 


and is achieved by decoding the signals sequentially while removing the previ- 
ously decoded signals and treating the uncoded signals as noise. This procedure 
is known as successive interference cancellation (SIC) and is summarized in 
Figure 3.17. The signal zı is first decoded using yı = y. Next, y2 is computed 
and Tə is decoded using it. This procedure continues successively until zg has 
been decoded. The whole procedure is also known as LMMSE-SIC because 
each signal is decoded using LMMSE combining, as discussed in Example 3.15. 

The signals were assumed to be decoded in increasing numerical order, 
which can be done without loss of generality because the precoding vectors are 
numbered arbitrarily. The expression on the left-hand side of (3.115) takes the 
same value regardless of how the precoding vectors are numbered; however, 
individual terms in the right-hand side expression will take different values 
depending on the numbering. Moreover, the right-hand side expression is 
explicitly achieved using LMMSE combining, as described in Example 3.15, 
but the choice of receiver processing is not visible in the left-hand side 
expression. The reason is that rate expressions implicitly assume an optimal 
receiver based on the available information. 

The SIC procedure is information-theoretically optimal but has several 
practical issues. Recall from Definition 2.6 that the capacity determines the 
data rate we can communicate at while achieving an arbitrarily low error 
probability as the number of symbols in the packet approaches infinity. Hence, 
to decode the signal zı actually means to decode an N-length codeword 
Z1(1],...,Z1[N] where N — oo or at least is very large. Next, we need to 
recreate Hp,Z;[I] for the time instances l = 1,...,N in the packet and 
subtract it from the entire sequence of received signals y{1],...,y[N]. This 
procedure requires extensive memory storage and causes delays proportional to 
K. Since N is finite in practice, there will also be a non-zero error probability 
for each stream, and when an error occurs, the wrong data signal will be 
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subtracted from the received signals. This increases rather than reduces the 
amount of interference and is called error propagation because it will likely 
result in decoding errors for all the remaining uncoded streams. 


Example 3.16. What is the achievable data rate if we decode the K signals 
separately without using SIC? 
When we decode signal i, we can express the received signal in (3.107) as 


K 
y=Hpizit+ SY) Hpi +n, (3.116) 
k=l ki 


~Ne(0,NoC_:) 


where the colored noise is based on the covariance matrix 


= Iņ + = yy, Heep (3.117) 
k=l aN 


It follows from Example 3.15 that the achievable data rate when treating 
interference as colored noise is 


logs (1 + #-p!H"C-!Hp,) À (3.118) 
No 

which is achieved using the LMMSE combining w; = n CHp. The 

achievable data rate of all the K data streams then becomes 


Yo (+4 sees -p!H"C_!Hp, ) ; (3.119) 


This value is smaller than (3.115) because of the lack of SIC (i.e., there is 
more interference). However, it is easier to implement the receiver processing 
in practice since the K data streams can be decoded in parallel. Moreover, 
unlike SIC, there is no risk of error propagation. 


This example describes a setup where each data stream is decoded inde- 
pendently while treating the other streams as colored noise. This is called 
linear processing because the receiver only performs a linear algebra operation 
before the signal decoding: it multiplies the received signal with a receive 
combining vector of the LMMSE-kind in (3.105). A block diagram is shown 
in Figure 3.18, where we can notice that the K decoding branches are parallel 
and independent. By contrast, the LMMSE-SIC receiver processing in Fig- 
ure 3.17 is non-linear because of the successive removal of interference, which 
connects the decoding of the different signals. 


3.4. Capacity of MIMO Channels 195 


Decode signal 1 with 
interf. from Z%2,...,2K 


Decode signal 2 with z 
interf. from 3,..., ZK a 
© —Hp272 


Decode signal K 


without interference 


Figure 3.17: A block diagram of the LMMSE-SIC receiver processing. When the precoding is 
not dividing the MIMO channel into parallel SISO channels, the receiver can instead decode the 
data signals sequentially to deal with interference. Each decoded signal is subtracted from the 
received signal vector before the next signal is decoded, while the remaining interfering signals 
are treated as colored noise. This is called successive interference cancellation. 


Decode signal 1 with 
interf. from Z2,...,2K 


Decode signal 2 with 
interf. from %1,7%3,...,2%K 


Decode signal K with 
interf. from %1,...,%K-1 


Figure 3.18: A block diagram of a linear MIMO receiver processing. Each data stream is 
decoded separately while treating the interference from the other signals as colored noise. 
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Figure 3.19: The capacity is compared with the data rates achieved with suboptimal precoding 


(i.e., sending an independent data stream per antenna) and two types of receiver processing: 
The LMMSE-SIC receiver in Figure 3.17 and the linear receiver in Figure 3.18. 


Figure 3.19 compares the MIMO channel capacity with the data rates 
achieved with equal power allocation and the precoding P = Ix that transmits 
one independent signal per antenna. In the latter case, we consider both the 
non-linear LMMSE-SIC receiver in Figure 3.17 and the simplified linear 
receiver in Figure 3.18. We consider a setup with M = K = 4 antennas. 
To obtain a slightly asymmetric channel matrix, we let the entries have 
unit magnitude but independent random phases between 0 and 27. The 
figure shows the average rates (over different random phases) as a function 
of SNR = No The LMMSE-SIC curve is below the capacity due to the 
suboptimal precoding. However, it approaches the capacity at high SNR 
because water-filling converges to equal power allocation in this regime so 
that VQ°?*V# = VV" = Ix. This is the same signal covariance matrix 
as when P = Ix and equal power allocation are used. The linear receiver is 
affected by more interference than the LMMSE-SIC receiver, but they perform 
equally well at low SNRs where the interference is anyway negligible. There is 
a substantial performance loss at high SNRs, but the curve with the linear 
receiver has the same slope as the capacity curve, which showcases that the 
same multiplexing gain of r = 4 is achieved in all cases. 
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3.5 


Exercises 


Exercise 3.1. Consider the capacity B log, (1 + Ex) of a SISO channel. 


(a) 


(b) 


Show that the capacity goes to zero when B — 0. What is the name of this 
operating regime? 


What happens to the capacity when B —> co? What is the name of this operating 
regime? 


Exercise 3.2. Consider the capacity C(P, B) = B log, (1 + Be) of a SISO channel. 


(a) 


(b) 


(c) 


(a) 


Compute the first-order derivative of the capacity with respect to P. At what 
value of P does the capacity grow the fastest? What happens with the capacity 
growth as P > œ? 


Compute the second-order derivative of the capacity with respect to P. Show that 
it is negative; that is, the capacity is a concave function of P. 


Compute the first-order derivative of the capacity, with respect to B. At what 
value of B does the capacity grow the fastest? What happens with the capacity 
growth as B — oo? Hint: Use the inequality qf; < ln(1 + 2) for x > 0. 


Compute the second-order derivative of the capacity with respect to B. Show that 
it is negative; that is, the capacity is a concave function of B. 


Exercise 3.3. Consider the capacity C (P, B) = Blog, (1 + a) of a SISO channel. 


(a) 


Suppose there is a reference setup where P and B have been selected such that 
PB/(BNo) = 7. We want to change the bandwidth from B to cB for some scalar 
c > 1 to at least double the capacity while keeping all other variables constant. 
What will at least double the capacity (compared to c = 1): increasing the 
bandwidth to 2B or 6B? 


Repeat (a) for the case when the reference setup has P8/(BNo) = 1. Can we find 
a value of c that doubles the capacity (compared to c = 1)? Hint: Utilize the fact 
that f(c) = log, (1 + t) —2 <0 foralle>1. 


Use the asymptotic limit of the capacity as B > oo (i.e., log,(e) BE) to derive the 
condition on the initial selection of P8/(BNo) so that we can double the capacity 
by increasing the bandwidth to cB for some c > 0. Use this relation to verify your 
answers to parts (a) and (b). 


Exercise 3.4. Consider a system where the received signal power is Px = 107°? W and 
the bandwidth is B = 100 MHz. There is an AWGN channel between a transmitting 
single-antenna user device and a receiving single-antenna base station. 


(a) 


(b) 


(c) 


Give an expression for the channel capacity, as a function of P,x, B, and the noise 
power spectral density No. 


What is the channel capacity in bit/s using the numbers given above and No = 
10-1" W/Hz? 


Suppose we would equip the base station with multiple antennas, and each antenna 
receives P.x = 107°? W. How many antennas do we need to get 8 times higher 
capacity than in (b)? 
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Exercise 3.5. Consider the SIMO channel y = haz +n, where the input signal z has 
the power limit E{|a|?} < q and the noise vector n has independent and identically 
distributed Nc(0, No)-entries. The channel h is an M-length vector with only ones, 
where M denotes the number of receive antennas. 


(a) What is the capacity of this channel? What kind of input distribution achieves 
the capacity? 


(b) Suppose g/No = 1. How many antennas do we need to achieve a capacity of 6 
bit /symbol? 

(c) Suppose we have M = 10 antennas. How large SNR q/No do we need to achieve a 
capacity of 6 bit/symbol? 

(d) Suppose all entries of h are equal to two, instead of one. What is the capacity of 
this channel? 


(e) Suppose all entries of h are equal to —1, instead of +1. What is the capacity of 
this channel? Compare it with (a) and explain the intuition behind the result. 


Exercise 3.6. Suppose we are designing an uplink communication system that should 
provide (at least) 400 Mbit/s at every point in its coverage area. The transmit power 
is 0.1 W, the bandwidth is 100 MHz, and the noise power spectral density is No = 
10~!” W/Hz. The propagation distance is denoted by d and the gain of the channel is 
|h|? = 1078 (1 km/d)*. 
(a) Use the capacity formula for a SISO channel to determine for which range of 
distances, d, we can deliver the required data rate. 


(b) We would like to extend the range to d = 2km, but we cannot increase the 
transmit power at the user devices. Instead, we will use multiple antennas at the 
receiving base station. Suppose the channel gain |hm|? is the same for each receive 
antenna m and matches the SISO case. How many antennas are needed? 


Exercise 3.7. Consider a SIMO channel where the single-antenna transmitter sends the 
signal x ~ Nc(0,q) to a receiver with M antennas. The received signal is denoted as 
y = hz +n, where h is constant and n is complex Gaussian noise. The receive combining 
vector w is applied to the received signal y to detect the signal x from w"y. 


(a) Suppose the noise vector is colored n, which means that the covariance matrix 
Cov{n} = C is not equal to a scaled identity matrix but invertible. Derive the 
receive combining vector w that maximizes the SNR. Hint: Define a = C!/?w 
and optimize a instead. 


(b) Consider a hypothetical system where C is a singular matrix and h is a non-zero 
vector in the nullspace of C (i.e., Ch = 0). What is the largest SNR that we can 
achieve in such a system? 


Exercise 3.8. Consider a MISO system with two transmit antennas where the received 
signal is y = hi - a1 + h2 - £2 +n, and n ~ Nc(0, No) is the independent receiver noise. 


(a) Suppose the two transmit antennas send the independent signals xı ~ Nc(0, qı) 
and x2 ~ Nc(0,q2), where the powers satisfy the constraint qi + q2 < q. What is 
the resulting data rate? Which values of qi and q2 will maximize that rate? Hint: 
The data rate expression in (3.106) can be utilized. 


(b) Compare the data rate from (a) with the MISO channel capacity. Under which 
conditions on hı and hz are they equal? 
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Exercise 3.9. Consider a downlink channel where the user device has one receive antenna 
and the base station has three transmit antennas. The transmit power P, bandwidth 
B = 100 MHz, and noise power spectral density No are selected such that P/(BNo) = 2. 
Suppose the channel vector is h = [3, 1, —4]*. 


(a) What is the capacity of this channel in bit/s? What does the capacity-achieving 
MRT vector become? 


(b) Suppose the base station hardware has restricted capabilities so that each entry of 
the precoding vector p must have a magnitude equal to 1/./3, such that ||p|| = 1. 
However, we can choose any sign/phase of the entries in the complex-valued vector 
p. How should we select the precoding vector to achieve the largest possible data 
rate (bit/s)? 


(c) Compare the rate values from (a) and (b), and provide a high-level explanation of 
the difference. 


Exercise 3.10. Consider the discrete memoryless point-to-point MIMO channel with 
the input x € C* and output y € C™ given by y = Hx + n. The receiver noise 
n ~ Nc(0, C) is independent of x but has an arbitrary non-singular covariance matrix 
Cec™*™ State the generalized version of Theorem 3.1 that supports such noise 
covariance matrices. Hint: Begin by whitening the noise. 


Exercise 3.11. Consider a point-to-point MIMO system with No = 1 and the channel 
matrix 


1 0 
ee o0 
H=| 0 3+4 (3.120) 


0 -v5 


a) What is the channel capacity? What is the covariance matrix of the capacity- 
y- y 
achieving input distribution? 
b) Consider the dual channel HT with -+ = 1. What is the channel capacity? What 
N pacity 
is the covariance matrix of the capacity-achieving input distribution? 


Exercise 3.12. Consider a point-to-point MIMO system with g/No = 2. Find the 
water-filling power allocation and capacity for each of the following channel matrices: 


1 1 1 0 1 1 
on- i] onf g] o sf 


Exercise 3.13. Consider a point-to-point MIMO channel with the channel matrix 


i» i 
H= Es 4] i (3.121) 
v3 VB 


(a) For what value of = is the capacity 2 bit/symbol if we use only the first antenna 
at the transmitter and the first antenna at the receiver? 


(b) For what value of is the capacity 2 bit/symbol if we use only the first antenna 
of the transmitter but both antennas at the receiver? 


(c) For what value of ṣọ is the capacity 2 bit/symbol if we use the whole 2 x 2 MIMO 
channel? Compare the results in (a), (b), and (c). 
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Exercise 3.14. Consider the transmission over a point-to-point MIMO channel with 
M = K = 2. We will use the SNR notation @ = q/No. 


(a) Suppose the channel matrix is 
—jr/3 —jn/3 
H= i : | ; (3.122) 


Compute the capacity of this channel as a function of ọ. Explain how the capacity 
is achieved and what kind of gain is achieved compared to the corresponding SISO 
channel, which has capacity log,(1 + o). 


(b) Suppose the channel matrix is 
H= $ a . (3.123) 


Compute the capacity of this channel as a function of 9. Explain how the capacity 
is achieved and what kind of gain is achieved compared to the corresponding SISO 
channel. 


(c) For which values of the SNR g is the capacity in (b) larger than in (a)? 


Exercise 3.15. Consider a MIMO channel with the channel matrix H € C’**. All the 
entries of H have unit magnitude. 


(a) Assume that M > K and all the columns of H are mutually orthogonal. What is 
the channel capacity for a given value of m 


(b) Compute the first-order derivative of the capacity expression in (a) with respect 
to K. Is it an increasing or decreasing function? Hint: Use the inequality > < 
ln(1 + x) for x > 0. 


(c) Compute the second-order derivative of the capacity expression with respect to 
K. At what value of K does the capacity grow the fastest? 


(d) Suppose that K = M. How does the capacity depend on K (and M) in this case? 


(e) How does the capacity depend on M and K when 5 is close to zero? 


Exercise 3.16. Consider an M x M MIMO channel matrix H with the singular values 
S1,..., SM. The eigenvalues A1,...,Au of HH" satisfies Am = s2, for m = 1,..., M. 
Suppose we are free to select the eigenvalues freely, under the constraint that they are 
positive and that a ee 


(a) Which selection of eigenvalues maximizes the capacity at low SNRs? Hint: Use 
that the water-filling only assigns power to the largest eigenvalue at low SNRs. 


(b) Which selection of eigenvalues maximizes the capacity at high SNRs? Hint: Use 
(3.3), Lemma 3.2, and that water-filling assigns power equally among the eigenval- 
ues at high SNRs. 
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Exercise 3.17. Consider the M x M MIMO channel where the received signal is 
y = HPx+n. (3.124) 


Suppose the precoding matrix is P = H™(HH")~'D, where D = diag(di,..., dm) isa 
diagonal matrix. 


(a) The columns of the precoding matrix are the precoding vectors pi,...,piz. How 
can D be selected to ensure each precoding vector has a unit norm? 


(b) Show that this precoding matrix creates M parallel SISO channels. What is the 
channel gain on each such channel? 


(c) Suppose H = f . Compare the gains of the parallel channels achieved in (b) 


1 2 
with the gains of the parallel channels obtained using the SVD. Which approach 
gives the largest sum of the channel gains? 


Exercise 3.18. Consider the M x K MIMO channel with the received signal 


K 
y = HPR +n = X prt + n. (3.125) 


k=1 


An arbitrary precoding matrix P = [pi,...,px] € C***™ with unit-norm columns px 
is used to send the K independent data signals from X = [%1,...,%K]". We assume 
that n ~ Nc(0, NoIm) and xX ~ Nc(0,Q) with a fixed power allocation matrix Q = 
diag(qı,...,qx). The resulting data rate in (3.106) is equal to the mutual information 
between X and y, i.e., Z(X;y). 


(a) The chain rule for mutual information is given as 


T(z£1,..-, En; Y) =L(ai;y) + T(£2; yla1) + L(xs; yl, x2) 


+...4+2 (an; y|v1,02,...,%n-1), (3.126) 
where Z(tn; y|@1,2,...,%n—1) is the mutual information between £n and y given 
the knowledge of £1, £2, ...,£n—1. Express the data rate for the considered MIMO 
channel Z(x; y) in terms of Z(%;; y|%1, Z2,..., Zi—1) using the chain rule for mutual 


information. 


(b) Consider the LMMSE-SIC receiver processing illustrated in Figure 3.17. At stage 

i, the LMMSE receiver w; = Cy Hp: is applied to the residual y; = y — 

a Hp;,Z; to decode z;. Since X and n are Gaussian distributed, the LMMSE re- 

ceiver is also the MMSE receiver. Using this, show that Z(%i; y|£1, T2, ... , Zi—1) = 

T (i; wi'yi|%1, Z2,--.,%i—1), i.e., that the MMSE receiver at each stage is infor- 
mation lossless. 


(c) Using (a) and (b), conclude that the LMMSE-SIC receiver processing is information- 
theoretically optimal. 


Chapter 4 


Line-of-Sight Point-to-Point MIMO Channels 


The capacity of a point-to-point MIMO channel with an arbitrary channel 
matrix H € CMXK was derived in the last chapter. In this chapter, we 
will derive a model for H in free-space line-of-sight (LOS) propagation and 
use it to analyze the capacity behavior further using the previously derived 
expressions. There is only one path between each transmit antenna and each 
receive antenna in free-space LOS channels, namely the direct path obtained 
by drawing a straight line between the antennas. This is an exact model of 
space communications, where no objects create additional signal paths by 
reflecting or scattering the transmitted signal. It can also be a reasonably 
accurate model of LOS channels on Earth, where there are objects that create 
additional signal paths, but these generally have much smaller channel gains 
than the direct path. This is particularly the case for the high-band spectrum, 
where the reflected paths typically are weaker while the LOS path is not. 

As in Chapter 3, we start with the special cases of SIMO and MISO 
channels, where only one side of the channel utilizes multiple antennas. The 
results will then be extended to the MIMO case. 


4.1 Basic Properties of Antenna Arrays 


Within the context of this book, an antenna array is a collection of antennas 
that operate jointly at the transmitter or receiver side of a communication 
system. We used K and M to respectively denote the number of transmit 
antennas and receive antennas in Section 3, which then became the dimensions 
of the channel matrix H € C“**. Two properties determine the channel ma- 
trix: the array geometries at the transmitter and receiver, and the propagation 
environment between them. 

Figure 4.1 exemplifies three different array geometries where all the anten- 
nas are deployed on a two-dimensional plane. Each array is characterized by 
the convex enclosure containing all the antennas, called the aperture, and its 
antenna arrangement. Figure 4.1(a) shows an antenna array with an irregu- 
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Aperture length 


Aperture 


Aperture length (b) Linear antenna array. 


Horizontal length 
< = 


Aperture 


A 


i Vertical 
length 


Aperture 


y 


(a) Irregular antenna array. 


(c) Planar antenna array. 


Figure 4.1: An antenna array is characterized by its aperture (i.e., the convex enclosure of all 
antennas) and the antenna arrangement within the aperture. Three examples are given in this 
figure, where the filled circles represent the individual antennas. The largest dimension of the 
aperture is called the aperture length. 


larly shaped aperture and a non-uniform antenna arrangement. Such arrays 
are seldom encountered in practice; in fact, the word array is often associated 
with a regular geometrical arrangement. Figure 4.1(b) shows a linear array 
with a uniform antenna spacing in one dimension, while Figure 4.1(c) shows 
a planar array with a uniform antenna spacing in two dimensions. 


Definition 4.1. The aperture length D is the largest separation between any 
two antennas in an array. It is called the normalized aperture length when 
normalized by the wavelength A and is then denoted as D, = D/A. 


The aperture length is indicated for each of the three examples in Figure 4.1. 
It is the distance between the first and last antenna in a linear array. In contrast, 
it is the distance between the antennas in opposite corners in a planar array. 
The aperture length will play an important role throughout this chapter. As 
indicated in Figure 4.1(c), a planar array’s horizontal and vertical lengths will 
also play a role when analyzing such arrays. 


4.2 Modeling of Line-of-Sight SIMO Channels 


The SIMO capacity analysis in the previous chapter was based on the discrete- 
time complex-baseband channel model ym[l] = hmx{l] + %m [I] in (3.12), where 
Ymll] is the received signal at the mth antenna, hm is the corresponding 
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Transmitter 


m? 


Receiver with 
M antennas 


M 


Figure 4.2: A free-space SIMO LOS channel where dm is the distance between the transmitter 
and the mth receive antenna for m = 1,..., M. The array of receive antennas has an arbitrary 
geometry in this figure. 


channel coefficient, x[l] is the transmitted signal, and nm[l] is the noise. The 
analysis considered arbitrary values of h1,..., haz. The purpose of this section 
is to derive expressions for these coefficients in a scenario where an isotropic 
antenna transmits in free space to an array of M isotropic antennas, as 
illustrated in Figure 4.2. 

We need to start the derivation from a continuous-time signal model since 
the physical channel affects this physical signal. The single-antenna transmitter 
sends the passband signal 


zp(t) = VIR (z(t)e?**") (4.1) 


of the kind previously defined in (2.111), where z(t) is the complex-baseband 
PAM signal in (2.120) and fe is the carrier frequency. 

We denote by dm the physical distance (in meters) between the transmitter 
and the mth receive antenna, for m = 1,..., M. Using this notation, the 
received passband signal at the mth antenna is (before noise is added) 


din 
Up,m(t) = y Bmp (« = “) ; (4.2) 
where dm/c is the propagation time delay, c denotes the speed of light, and 


à 1 
m 4.3 
is the free-space channel gain computed as in (1.7). 
From (4.2), we notice that the transmitted signal is attenuated by a factor 
V Bm and delayed by tha seconds. This matches the channel model type 
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introduced in Section 2.3.3 with L = 1 path. As mentioned in that section, the 
receiver must delay its clock by 7 seconds to compensate for the propagation 
delay before sampling the received signal. Following (2.125), the continuous- 
time channel impulse response to the mth receive antenna in the complex 
baseband then becomes 


dm 
gmlt) = V/Bme er tees (« +n- t) , m=1,...,M. (4.4) 
C 


Furthermore, it follows from (2.128) that the received signal at the mth 
antenna after sampling is 


oO 


uml] = Y 2[k] Ane t-sinc (« k) 4 B(n “)) P 


k=—oo 


(4.5) 


To avoid intersymbol interference, we would like to select the sampling delay 
7 such that 


sine (( b+ B(n ta) ) ~ sinc( — +) = CAE C 
c 0, LÆk, 


(4.6) 
dm 


Exact equality is achieved for 7 = “+, but this value depends on the antenna 
index m. Since each antenna experiences a different propagation delay, we 
generally cannot find one value of ņ that achieves exact equality for all of 
them. Suppose we use the first antenna (m = 1) as the timing reference for 
the sampling by setting 7 = oa We then want B&n) to be close to zero 
for m = 2,..., M. This means that the maximum difference in propagation 
delay max me{2,...,M} ldm- with respect to the reference antenna should be 
much shorter than the symbol time +: 


B 
ld — dı| 1 

——— <x. 4.7 

mel MY c = B a) 

The distances d;,...,d,)¢ depend on the transmitter’s location compared to 


the array, but the maximum difference only depends on the array geometry. 
The aperture length D was introduced in Definition 4.1 as the maximum 
separation between any two antennas in the array. The worst-case delay 
scenario can be constructed by identifying two receive antennas separated by 
D, making one of them antenna 1, and then placing the transmitter on the 
line that connects these receive antennas. The condition in (4.7) becomes 
D 1 


— & = 4. 
oR (+3) 


in this worst-case scenario and is satisfied for many array sizes and signal 


. . . SAN lm n 1 
bandwidths. For example, if the aperture length is D = 1m, then -7 = $ 
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implies that B = ;< = 300 MHz gives equality in (4.8). Many practical 


systems use much smaller bandwidths (e.g., 20 MHz) and often smaller arrays. 
Hence, we will use the approximation in (4.6) in this chapter. General channel 
modeling without this approximation will be considered in Chapter 7. 


Example 4.1. Suppose the condition in (4.8) is assumed to hold if 2 < 1, 
What is the maximum allowed normalized aperture length if 

(a) fe = 3 GHz, and B = 20 MHz; 

(b) fe = 30 GHz, and B = 300 MHz; 

(c) fe = 30 GHz, and B = 1 GHz? 


The normalized aperture length is defined as D, = D/A. Since c = Afe, 
we need to satisfy the condition 


Dy _ 01, 
fe~ B 


which leads to the following maximum allowed normalized aperture lengths: 


(a) Dy < Se = Seat = 15 wavelengths; 


0.1-30-10° 
> -300-106 


N 


(b) Dy = 10 wavelengths; 


(e) Da < ane = 3 wavelengths. 


By substituting (4.6) into (4.5), the system model simplifies to 
i (dm -—d1) H (dm -d1) 
Ymll] = VBme P =~ tll] +nm[l] = VbBme S rfl] +nmll], (4.9) 
D 
=hm 


where the second equality utilizes the fact that the wavelength at the carrier 
frequency is A = c/ fe. We can identify the value of hm from (4.9): 


ri (dm —d ) 
hm Sal Bee (4.10) 
This channel response consists of a channel gain Bm and a complex exponential 
ion dm = 41) zi : : z a 
e 27 =E containing a phase-shift proportional to (dm — d1)/À. This is 


not the absolute phase-shift of the propagation but the relative phase-shift 


lIn practice, it is preferable to select antenna 1 to minimize the maximum separation to all 
other antennas; for example, if the array has a square shape as in Figure 4.1(c), we should pick 
the antenna in the center as the timing reference, instead of an antenna in one of the corners as 
in the worst-case scenario. This will reduce the maximum delay from D/c to D/(2c). However, 
to obtain expressions that resemble those in other textbooks, we will nevertheless use one of the 
corners as the reference antenna in this chapter. 
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compared to the reference antenna. We can collect all the channel responses 
in the channel vector 


VPA 
hy Je 
h=|]: |= . i (4.11) 
h ; = 
M Jue ™ Um- dy) 
where we recall that Bm = a. $ form=1,...,M. 


The approximation that we utilized above results in frequency flatness 
since the impulse response in (4.4) has effectively been approximated as 


gmlt) = hmô(t), m=1,...,M, (4.12) 


which has a Fourier transform with the constant value hm across all frequencies. 
We can utilize the derived expression in (4.11) when dealing with any practical 
receiver array of limited size. When the array has a regular geometrical 
structure, it can be utilized to simplify the expression. A particular example 
will be considered next. 


4.2.1 Uniform Linear Array at the Receiver 


One type of antenna array is particularly common to deploy and analyze: 
the uniform linear array (ULA). In this array type, the M antennas are 
deployed with uniform spacing, and the centers are located on a straight 
line, as in Figure 4.1(b). We let A denote the spacing between the centers 
of any two adjacent antennas. The spacing between the centers of the two 
outermost antennas will then be (M — 1)A. The total length of the ULA, 
measured between the outer edges of the outermost antennas, depends on the 
physical width of the individual antennas, which depends on the hardware 
implementation. For convenience, we will denote the aperture length of the 
ULA as MA, because this expression will appear in many expressions derived 
in this chapter. A setup with a receiving ULA is shown in Figure 4.3. We 
continue to use receive antenna 1 as the reference point and define the angle- 
of-arrival p E€ [—7,7) of the impinging signal at this antenna, as shown 
in the figure. More precisely, we consider a two-dimensional plane (in the 
three-dimensional world) that contains the ULA and transmitter, and define 
angles in that plane.? Note that y = 0 corresponds to a transmitter on a 
line perpendicular to the line where the ULA is deployed. This is called the 
broadside or front-fire direction of the array, which is a terminology borrowed 
from how the canons on a warship are lined up to fire toward the sides. Two 


?This assumption can be made without loss of generality. A plane is defined by two linearly 
independent vectors that lie in the plane; thus, we can create the plane by selecting one vector 
pointing along the ULA and the other vector pointing from the reference antenna to the 
transmitter. 
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Figure 4.3: Illustration of communication from a single-antenna transmitter to a receiver 
equipped with a ULA. The antenna spacing is A, and the distance to receive antenna m is dm 
for m = 1,..., M. The angle-of-arrival y is measured at the first antenna. The transmitter is in 
the broadside direction if p = 0, while it is in the end-fire direction if p = +7/2. 


Figure 4.4: The distance dm between the transmitter and mth receive antenna can be computed 
based on dı, A, and ¢ using the law of cosines. 


other important directions are y = +7r/2, where the transmitter is on the 
same line as the ULA. These are called the end-fire directions of the array. 
We can now use the geometry to compute the distance dm to the mth 
antenna as a function of d,;, y, and A. Their relationship is illustrated in 
Figure 4.4, and we can utilize the law of cosines to establish the relationship 


d = d aje (m a: 1) 4A? — 2di(m — 1)A cos (¢ a 5) 


This difference in propagation distance will affect both the channel gain Bm 
and the phase-shift 2r (m= ds) in (4.10). In many cases of practical interest, it 
holds that dı >> MA, which means that the distance between the transmitter 
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and the first antenna is much larger than the aperture length of the ULA. 
Since the channel gain depends on the total distance, it then follows that 


B, = X? 1 Ad -g 
™" (47r)? d2 + (m— 1)2A? + 2di(m—1)Asin(y) (4r)? a? i 
(4.14) 
for m = 1,..., M. This means the channel gain is approximately the same for 


all antennas in most free-space LOS scenarios. For simplicity, we will use the 
notation 

à? 1 
(47)? d? 
without an antenna index to denote the common channel gain of all antennas, 
where d = dı denotes the distance to the reference antenna. 

In contrast, the phase-shift 2r Cez) at the mth antenna depends on 
the relative distance dm — dı between the mth and first antenna, and this 
variation cannot be neglected, even if the total distance is considerable. Recall 
from Section 1.1.2 that the transmit antenna emits a spherical wave, which 
can be approximated as a plane wave when the receive antenna is beyond 
the Fraunhofer distance defined in (1.18). The same argument can be applied 
when considering a receiving ULA, but the aperture length of the ULA should 
be considered instead of the width of a single receive antenna. More specifically, 
the impinging wave will have an approximately planar wavefront if 


f=ß = (4.15) 


is 2M°A? 

12 ; 

When this holds, we say that the ULA is in the far field of the transmitter. 

The far-field condition is often satisfied in practice; for example, if the ULA 

has an aperture length of 1m and à = 0.1m (ie., fe = 3 GHz), then we 

need dı > 20m to be in the far-field, which is typically the case (at least in 

the practical scenarios where such large arrays are being used). Moreover, 

the condition (4.16) generally implies dı > MA, as assumed in (4.14) when 
approximating the channel gain, because 2M A/A > 1 for most arrays. 

The difference between spherical and planar wavefronts is illustrated in 
Figure 4.5, which shows snapshots of sinusoidal waves propagating in the 
zy-plane. The shape of the wavefront is seen by inspecting the points that 
attain the maximum value at the same time: these points lie on circular curves 
in Figure 4.5(a) and on straight lines in Figure 4.5(b). When this example is 
extended to wave propagation in three dimensions, the circular curves become 
spherical, while the straight lines become planes. The wave in Figure 4.5(b) 
propagates along the x-axis. The spatial frequency is zero along the wavefront 
(i.e., no phase difference along the y-axis). We recall from Section 2.8.3 that 
the spatial frequency is +1/A in the direction the wave propagates. 

Even if the impinging wavefronts are planar, the receiving ULA might not 
be deployed in a direction that matches the wavefronts. In particular, the 


(4.16) 
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(a) Spherical wavefronts. 
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(b) Planar wavefronts. 


Figure 4.5: Example of two sinusoidal waves propagating in the xy-plane. The vertical axis 
shows the value at a particular time instance. The shape of the wavefronts can be seen by 
drawing lines between the neighboring points that attain the same value simultaneously. The 
wavefronts are spherical in (a) and planar in (b), represented by circular and straight lines in 
this two-dimensional example. 
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Figure 4.6: The isotropic transmitter emits spherical waves, which look like planar waves when 
the receiver is far from the transmitter. The angle-of-arrival y is approximately the same for 


all antennas, and the difference in propagation distance between antenna 1 and antenna m is 
(m — 1)Asin (y). 


distances to the receive antennas will differ when the wave arrives from a 
non-broadside direction with angle y ¢ {0, +r}. As shown in Figure 4.6, the 
difference in propagation distance between the first and the mth antenna can 
be computed using trigonometry when having planar wavefronts: 


dm — dı = (m—1)Asin(y). (4.17) 


This happens because the opposite angle is y and the triangle’s longest 
side is (m — 1)A. The phase difference between the considered antennas is 
2n(m—1)Asin (vy) /A. As the distance between the antennas is (m — 1)A, the 
phase variations between the signals observed simultaneously at the different 
antennas in the ULA vary with a spatial frequency of sin(y)/A periods per 
meter. The information-bearing signal still oscillates with time at the temporal 
frequency fe, but the relative phase difference between the antennas remains 
constant and is determined by the spatial frequency. Hence, the (spatial) 
channel vector h contains this spatial frequency, not the signal. One can 
view it as the spatial counterpart to how the (temporal) impulse response 
of the channel has a frequency response containing a collection of different 
frequencies. For the considered ULA, the channel contains the zero-valued 
spatial frequency when deployed parallel to the wavefronts (i.e., p € {0,+7}). 
Similarly, the channel contains the spatial frequency +1/\ when deployed 
along the direction of propagation, which is the case when y = 7/2. 

We have now derived far-field approximations of the channel gain and 
phase-shifts when using a ULA. By substituting (4.14) and (4.17) into the 
general expression for hm in (4.10), we obtain 


«9. (dm—ad}) «5 (m—1)Asin(y) 
dome Pe Raf fer WS Tyan M. (4.18) 
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Alternative planar wavefront n-o Planar wavefront 


Figure 4.7: The channel vector in (4.19) depends on sin(y), where is the angle-of-arrival 
of the impinging planar wavefront. The same channel vector is obtained if a planar wavefront 
impinges from the alternative angle m — y, which leads to a mirror-like ambiguity. 


This expression is unique to ULAs because it explicitly utilizes how the 
antennas are located with respect to each other. In summary, in the free-space 
SIMO channel with a ULA, the (approximate) channel vector is 


1 

—; Asin(y) 

hi e j2r =A 

2A sin(ẹ) 
A 


h= = JB e i20 


(M—1)Asin(y) 
A 


(4.19) 
e 20 


Two variables determine the transmitter’s location: the channel gain 8 and the 
angle y. These variables affect h differently. The norm ||h|| = M6 depends 
only on the channel gain, while the direction TET only depends on the angle 
y. This is a characteristic feature of far-field propagation. 

The angle-of-arrival is measured from the broadside direction, as indicated 
in Figure 4.6, and can take any value from —7 and a. The channel vector in 
(4.19) depends on this angle but only through the sine of it, which creates 
ambiguity because sin(y) is not a bijective function for y € [—7,7). More 
precisely, sin(y) = sin(a — p) for any p, which implies that every feasible 
channel vector can be obtained by two different angles-of-arrival. This happens 
for pairs of incident wavefronts that are each others’ mirror reflections, as 
illustrated in Figure 4.7. This is the reception counterpart of the phenomenon 
previously illustrated in Figure 1.17 and Figure 1.19: when a ULA with 
isotropic antennas beamforms in one angular direction, it will also beamform 
in the mirror-reflected direction. When we continue analyzing ULAs in this 
chapter, we will mostly consider signals arriving from (or transmitted into) 
the half-space represented by y € [—7/2, 7/2]. There are two main reasons for 
this. Firstly, we can illustrate the beamforming concepts more clearly since 
there will mainly be one beam direction. Secondly, many ULAs deployed in 
practice use directive antennas that only radiate signals into the directions 
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given by y € [—7/2, 7/2]. The cosine antenna in Figure 1.10 has this property 
and is suitable for deployment on a wall to cover the half-space in front of 
the wall. When considering the half-space represented by y € [—1/2, 7/2], we 
span the entire range of spatial frequencies from —1/\ (when Y = —7/2) to 
1/X (when y = 7/2) that the channel vector can contain. In other words, we 
can distinguish between signals impinging from different angles at the same 
side of the array because they give rise to channel vectors containing different 
spatial frequencies. However, we cannot uniquely distinguish these signals 
from their respective mirror reflections. 


Example 4.2. Consider a ULA with antenna spacing A = \/2 designed for 
the carrier frequency fe = 3 GHz and bandwidth B = 20 MHz. The aperture 
length is 15A (i.e., the maximum length from Example 4.1) and the transmitter 
is located at a distance dı = 50m in the angular direction y = 7/6. 


(a) What is the number of antennas, M, in the array? 


(b) Compute the channel gains of the outermost antennas using the exact 
formula in (4.14) and comment on the differences. 


(c) Compute and compare the two expressions in (4.17) for m = M. 
The aperture length of the ULA is MA = MX/2 in this example. 
(a) The length is said to satisfy MA/2 = 15A, which implies M = 30. 


(b) Using sin(7/6) = 0.5, A = 0.1m (fe = 3 GHz), dı = 50m, and A= 
à/2 = 0.05 m, the squared distance in (4.13) to the last antenna becomes 


dî; = 50° + 29° - 0.05? + 2 - 50 - 29 - 0.05 - 0.5 ~ 2575m?. (4.20) 


The channel gains can now be computed using (4.14) as 


i= wT x~ 2.53 - 1078 ~ —76.0 dB (4.21) 
1 (4m)? 502 l A : 
0 i 
N = = 2.46- 1078 x~ —76.1 dB. 4.22 
pm (47)? 2575 an 


We notice that 8; and Gy, differ by as little as 0.1 dB; thus, the far-field 
approximation is highly accurate. 


The exact distance difference in the left-hand side of (4.17) is dm — dı ~ 
V2575 — 50 ~ 0.74m. The far-field approximation in the right-hand 
side of (4.17) becomes (M — 1)Asin(y) = 29 - 0.05 - 0.5 ~ 0.73m. The 
approximation error is around 0.01m, which is roughly one-tenth of the 
wavelength; thus, the far-field approximation is highly accurate. 


— 
(©) 
wa 
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As explained in Section 2.8.3, separating adjacent antennas by half a 
wavelength is common to obtain spatial samples at twice the maximum spatial 
frequency 1/ that the channel might contain. This corresponds to the antenna 
spacing A = /2. If we substitute this value into (4.19), it simplifies to 


1 
hı e i sin(y) 


hele eae) ere (4.23) 


hm : 
e`it(M-—1) sin(y) 


We will analyze the impact of other antenna spacings in Section 4.3.4. 
4.2.2 SIMO Channel Capacity with ULA 


The capacity of a SIMO channel was presented in (3.23) as 


P|ihl|? 
BNo 


C = Blog, (1 + ) bit /s. (4.24) 


For a ULA with h given by (4.19), we have ||h||? = M8 which is independent 
of the antenna spacing. By substituting this value into (4.24), we obtain 


PMB f 
C = Blogs (1 + BN, ) bit/s. (4.25) 


If we compare this expression with the SISO capacity B loga (1 + E) from 
(2.146), we notice that the SNR is M times larger in the SIMO case. This is 
the beamforming gain obtained when receiving the same signal at M antennas 
and optimally combining the observations using MRC. In this case, the MRC 
vector in (3.19) becomes 


1 
ea ian SS) 
h 1 = 2A sin(y) 
eee et x 4.26 
"=al va ea 
o- j2r SDA sin(o) 


Since the channel gain is the same for all receive antennas, all the elements in 
w have the same magnitude. In other words, all antennas contribute equally 
much to improve the SNR achieved over free-space LOS channels. MRC rotates 
the phases of the received signals so that w"h becomes a sum of M positive 
terms, each equal to \/3/M. Recall that the phase-shifts in the channel vector 
are caused by having different propagation delays to the different receive 
antennas. Since a conjugate transpose is applied to the MRC vector when 
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multiplying it with the channel vector in (3.17), MRC compensates for these 
delay variations. The result is essentially the same as if the received signal 
had been sampled at slightly different times at the different antennas. 


Example 4.3. How does the SIMO capacity in (4.25) depend on the wave- 
length A if the number of antennas is fixed? 


The capacity expression depends on the wavelength A through 6 = T A E $, 
which was defined in (4.15). Hence, we can express (4.25) as 
PM» 
C= BI 1+ ——__ }. 4.27 
= ( 4 ET. oe 


If M is constant, then the capacity in (4.27) is an increasing function of 4, 
since the SNR is proportional to \?. This implies that the capacity is larger 
when using low-band spectrum than with high-band spectrum. The reason is 
that the ULA consists of M isotropic antennas with the wavelength-dependent 
area \?/(47) from (1.3). The strength of the electric field that impinges on 
the ULA is independent of the wavelength, but the array captures less power 
when the individual receive antennas shrink in size when A is reduced. 


4.2.3 Array Factor and Spatial Filtering 


In LOS communications, MRC acts as a spatial filter that attenuates any 
component of the received signal that arrives from an angle that is (substan- 
tially) different from y. This applies to noise as well as interfering signals. The 
channel vector in (4.19) can be expressed as h = /Ba(y), where the vector 
1 
en ian SS) 
e-j27 2A sin(y) 


a(y) = ec” (4.28) 


o-j2r DA since) 

is called the array response vector or steering vector. This vector depends on 
the angle-of-arrival y through the function sin(y)/A, which we recognize as the 
spatial frequency the channel vector contains. If two signals arrive from vastly 
different angular directions, their respective channel vectors contain vastly 
different spatial frequencies. Consequently, their respective array response 
vectors point in vastly different directions in the vector space C™. 

To give a concrete example, suppose the desired signal arrives from y = 0. 
The MRC vector in (4.26) then becomes w = a(0)/VM = [1,..., 1]"/WM. If 
an interfering signal V/Binterre 0”? reaches the reference antenna from the 
angle Yintert, then the signals reaching each of the antennas is computed as 


V Binterte Aeta (Pintert), (4.29) 
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Figure 4.8: An MRC filter designed for an incoming signal from the angle 0 acts as a spatial 
filter that attenuates any interfering signal arriving from an angle much different than 0. There 
are M = 10 antennas in this example, so the maximum beamforming gain is 10. 


since the array response vector determines the relative phase-shift compared 
to the reference antenna. If we now apply MRC to (4.29), the resulting scalar 
signal is 


w” (v e 4A (intert) = vy Binterte 1?intert Ww" a(Yinterf) : (4.30) 
eS ES 


Interfering signal Array factor 


This is the original interfering signal /Binterre JY’ at the reference antenna 
multiplied by the factor w"a(Yintert), which is the inner product between 
the MRC vector and the array response vector for the direction that the 
interfering signal arrives from. This inner product is called the array factor 
and determines how the array as a whole amplifies/attenuates and phase- 
shifts the signal by its processing. When talking about spatial filtering, we 
are interested in the squared magnitude of the array factor: 


M 2 
1 1 X j j pCI Asin (Yintert) 
|w"a(Yintert) |” = gg 2" alYintere)|” = M a " À ? 
(4.31) 


which determines the relative signal strength compared to the case with 
a single-antenna receiver. The beamforming gain the filter applies to the 
interfering signal can attain any value between 0 and M. The precise value in 
(4.31) depends on the angle Yinterf, as it should for a spatial filter. 

Figure 4.8 shows |w"a((intert)|? for the antenna spacing A = \/2, M = 10 
receive antennas, and varying values of Yinterr. The MRC vector is designed 
for a signal arriving from the angle 0, which has a channel vector with the 
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spatial frequency sin(0)/\ = 0. There is an angular interval around 0 where 
MRC will amplify any arriving signal, with at most a factor M = 10. However, 
any interfering signal arriving outside that angular interval will be greatly 
attenuated. In this sense, MRC acts as a spatial bandpass filter: it only passes 
through signals from specific spatial directions (i.e., their channel vectors 
contain spatial frequencies within a specific range). The width of the spatial 
filter can be quantified analytically as a function of M and A, and is generally 
inversely proportional to the aperture length MA. We will postpone the 
detailed analysis to Section 4.3.2 since this example only intends to introduce 
spatial filtering from a qualitative perspective. 

The derivations in this section have assumed isotropic antennas but can 
also be applied when using directional antennas. The generalization is obtained 
by including the antenna gains in 8, following the approach in Section 1.1.4, 
and will be provided later in Section 4.5. 


4.2.4 Acquiring Channel State Information 


The channel vector h completely characterizes a deterministic frequency-flat 
channel, irrespective of whether the general model in (4.11) is used or the 
ULA-specific model in (4.19). To achieve the SIMO channel capacity in (4.24), 
the receiver must know h so that it can first apply MRC and then decode the 
data symbols. Moreover, the transmitter must know the capacity value C to 
encode the data accordingly. Thus far, we have assumed this information to be 
available automatically, but an acquisition mechanism is required in practice. 
The vector h is often referred to as the channel state, while the available 
knowledge of it is called the channel state information (CSI). It is sometimes 
necessary to distinguish between the CSI available at the transmitter and 
the receiver if these are substantially different. When communicating over 
deterministic channels, as in this chapter, it is common to assume that perfect 
CSI is available at both the transmitter and receiver; that is, the channel 
vector is known precisely. The goal of this section is to justify that statement. 

Suppose we transmit a packet of L symbols {2[l] : l= 1,..., L} over the 
discrete memoryless SIMO channel 


yil] =he[l]+n{l] for /=1,...,L, (4.32) 


where n{/] ~ Nc(0, Nols) is the receiver noise. We consider the case when 
h is unknown at the receiver when initiating the transmission. Since the 
received signal in (4.32) contains products hz{l] between the channel and 
the transmitted symbols, it is hard to separate h and zl] if they are both 
unknown. To resolve this ambiguity, we can divide the packet into two parts: 


1. A preamble part with Lp predefined symbols that enables estimation of 
h since z|] is known for |] = 1,..., Lp; 


2. A payload part with L — Lp symbols where detection of the random data 
symbols x{/] (for l = Lp +1,...,L) is possible since h is now known. 
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Preamble Payload 


Figure 4.9: A data packet contains a preamble that can be used for channel estimation and a 
payload with data symbols. 


Such a packet is illustrated in Figure 4.9. We have analyzed the second part 
when characterizing the capacity; thus, we will focus on the preamble in this 
section. The preamble is often called a pilot sequence since it tests the channel 
quality before the data transmission commences. To comply with the symbol 
power constraint E{|zx[l]|?} < q, we can utilize the constant symbols 


zll] = VG, 1=1,...,Lp, (4.33) 


in the preamble. Suppose we compute the sample average of the received 
preamble signals (divided by \/q): 


i. 32 {i = 12 
——)_ y{l] =h—— q+—— >) nll] =h-+n’ (4.34) 
Loya 2 Lpyq 3 v Lpyq > 


=1 


where n’ = La DA n{l] ~ Nc(0, p m) since we are computing a weighted 
sum of independent noise vectors, each having variance No. We can notice that 
(4.34) is equal to the channel vector h plus a noise vector n’ whose entries have 
a variance i a that is inversely proportional to the length Lp of the preamble 
(called a processing gain). The noise variance will go to zero as Lp > oo, 
and so will all random realizations of n’ that are likely to occur.’ Hence, 


A D y[l] — h so that the receiver has acquired a perfect estimate of h. 


This example shows that, by making the preamble sufficiently long, we can 
achieve any desired exactness of the CSI. When quantifying the estimation 
error for finite values of Lp, it is important to relate the magnitude of the 
error to the magnitude of the channel. This can be done by considering the 
NMSE metric from (2.160), which in this context becomes 


E{|[n’||?} No 
NMSE = — = . 
E{||hl?} Lpa 
The NMSE is a decreasing function of the channel gain 8, so we can generally 
use shorter preambles when the channel is strong. Moreover, the NMSE in 


(4.35) 


3The Gaussian distribution has unbounded support; thus, it can give rise to arbitrarily large 
realizations, but the probability of them occurring goes to zero as Lp — oo. It is also important 
to remember that the Gaussian modeling of receiver noise is approximate since we cannot get 
arbitrarily large noise realizations in practice. 
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(4.35) is independent of the number of antennas; thus, it is equally easy/hard 
to estimate the channel vector to a ULA with M = 100 antennas as toa 
single-antenna receiver. This is because all the receive antennas listen to the 
same pilot transmission and perform their channel estimation simultaneously. 


If we switch focus to the payload part, we recall from Definition 2.6 that 
the channel capacity is achieved “as the number of symbols in the packet 
approaches infinity”. This implies that we need L — Lp — oo. However, since 
only a fraction 


(4.36) 


of the packet in Figure 4.9 is used for data symbols, the capacity is obtained 
by multiplying the fraction in (4.36) with the capacity value computed earlier 
in this chapter. Interestingly, it is possible to operate the system such that 
this fraction becomes arbitrarily close to 1. For example, if we let Lp = VL, 
we will get perfect CSI as L — oo since this implies Lp — oo. However, the 
fraction in (4.36) becomes 1 — Lp/L = 1 — 1/VL — 1, so asymptotically 
there is no loss in capacity from having the preamble. In other words, we can 
safely assume that the receiver has perfect CSI when evaluating the capacity 
of deterministic channels because we can simultaneously make the preamble 
large enough to acquire perfect CSI and negligibly small compared to the 
packet’s total length to avoid a capacity loss. 


When the preamble has been transmitted, the receiver can compute the 
channel capacity and feed back this information to the transmitter so that 
it can encode the data accordingly. In practical systems, there is usually a 
predefined table of data rates that the system supports using different MCS, 
such as the one for 5G NR exemplified in Table 2.18. It is then sufficient to 
feed back a few bits to indicate which table entry to utilize (e.g., 5 bits when 
there are 28 rows, as in the table). 


Example 4.4. Suppose we want to transmit 10 kbit of data over an LOS 
SIMO channel. We have M = 8 antennas and the SNR g = 10 dB. How long 
preamble is needed to achieve an NMSE of 0.01? How many data symbols 
are needed if we communicate at the capacity? 

We need the NMSE in (4.35) to become 0.01, which for the given SNR 
value means that A = 0.01. A preamble of Lp = 10 symbols satisfies this 
requirement. 

The SIMO capacity in (4.25) can be expressed as logy(1 + a) = 


log,(81) = 6.34 bit /symbol since q = P/B. We therefore need a = 1577 


symbols to transmit 10 kbit. The packet’s total length will be L ~ 1587, where 
99.4% is used for data and 0.6% for the preamble. 
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4.2.5 Maximum Likelihood Channel Estimation 


The previous section described a protocol for acquiring CSI by transmitting a 
preamble of length L, and then computing the average of the received signals 
according to (4.34). The result is a consistent estimate of the channel vector 
h, meaning that we obtain an exact estimate as Lp — oo, but it is not the 
most accurate estimator for a given finite value of L,. The array geometry 
and propagation scenario provide valuable information that can be utilized for 
improved estimation. For example, when considering a ULA in a free-space 
LOS scenario, we know from (4.19) that only channel vectors with a particular 
structure can appear: 
1 
e727 A siate) 


2A sin(y) 
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(M—1)Asin(y) 
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(4.37) 
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These feasible channel vectors are parametrized by the channel gain 8 € [0,1] 
and array response vector a(y) € C™ from (4.28), which only depends on the 
angle-of-arrival y € [—7/2,7/2] because the impinging wavefront is planar. 
The fact that the M-dimensional complex-valued channel vector h is entirely 
determined by two real-valued variables indicates that only a tiny subset of all 
vectors in C™ can appear as channel vectors in LOS communications. We must 
select a suitable design criterion when designing a parametric estimator that 
utilizes this structural knowledge. We will consider the maximum likelihood 
(ML) criterion that identifies the feasible channel vector most likely to have 
provided the received signals during the preamble transmission. 

The channel vector h = \/Ga(v) is deterministic but unknown. The PDF 
of the received signal y{I] = h,/q + n[?] in (4.32) can be expressed as 


1 _ lly(tj-hyail? 
fyw (yl) = GN) ° No (4.38) 
because y{l] — h,/q ~ Nc(0, NoIm) whose PDF is given in (2.80). Since the 
noise realizations in y[1],...,y[Z,] are independent, the joint PDF is 
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Figure 4.10: The utility function in (4.40) depends on the potential angles-of-arrival p, as 
shown in the figure for one random noise realization. The utility is maximized to obtain the ML 
estimate Ê in (4.40). The true angle-of-arrival is 7/6 and the ML estimates are marked by stars. 
The peak values become more distinct as the number of antennas increases. 


where the last equality utilizes the fact that ||h|/? = M8. The ML estimates 
of 8 and ọ are the values that jointly maximize (4.39), which is equivalent 
to maximizing the argument of the exponential function. If we begin by 
considering the angle-of-arrival estimation, the angle only appears in the term 
R(a™(y) seas y|!]); thus, the ML estimate is obtained as 


Lp 
p= argmax R a¥(y) X yil] : (4.40) 
ve[-$,5] I=1 


We should look for the array response vector a(y) that has the largest real 
part of the inner product with the sample average of the received signals 
(except for some missing scaling factors that will not affect the solution). 
This corresponds to comparing the average received signal with the plausible 
signal vectors obtained with different spatial frequencies to determine the best 
match. The maximum can be found by doing a one-dimensional search over 
the range of possible angles. 

Figure 4.10 shows the utility function in (4.40), normalized by \/L,pNo so 
that each entry of Se n{/] has unit variance, for different potential values of 
p. We consider a scenario with A = \/2 and SNR = q8 Lp/No = 10dB. The 
true angle-of-arrival is 7/6 and the number of antennas is either M = 10 or 
M = 20. The utility function oscillates, but there are distinct maximum peaks 
in both cases, and the estimate ¢ in (4.40) is obtained at the peak values 
(marked by stars). The ML estimator exploits the ULA’s spatial filtering 
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feature to identify the angle of the arriving signal. As the number of antennas 
increases, the peak becomes taller and narrower, which implies that the 
estimation accuracy improves with M. An equivalent interpretation is that 
we estimate the spatial frequency of the channel as sin(¢)/A, and we can 
distinguish between smaller variations when having more antennas. This is a 
vital benefit of the parametric ML estimator compared to the non-parametric 
sample-average estimator in (4.34), which achieves an NMSE independent of 
M. The reason is that the former only needs to estimate the two parameters 
b and y, while the latter needs to estimate M parameters. 


Example 4.5. Consider the ML channel estimation method described in this 
section and assume the angle-of-arrival is estimated perfectly: ¢ = y. Let 
a = \/B denote the square root of the channel gain. What is the ML estimate 
of a? What are the mean and variance of the estimation error? 

The ML estimate of a is obtained by modifying (4.43) as 


L H Ly 
. OR lea eed ee IpMo2qg ® (ae) Di yl) 
â = arg max Rl ak (yp yll = ; 
a€(0,1] No l )d, [! No LIpM./q 
where the solution is obtained by taking the first-order derivative with respect 
to a, equating it to zero, and solving the equation. We notice that @ is the 
square root of the ML estimate of 6 in (4.43) (when ¢ = y). 
Recalling h = /Ba(y) and the received signals from (4.32), we write @ as 
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where nq denotes the estimation error and we used a¥(y)a(y) = M. Since 
E{a¥(g)n[l]n"[la(y)} = a®(y)E{n[l}n"[l]}a() = a"(p)a(y)No = MNo, it 
follows that a®(y)n{[l] ~ Nc(0, M No) and R(a®(y)n[l]) ~ N(0, MNo/2). The 
mean of the estimation error is E{na} = 0 and the variance is 


No 
2L,Mq 


E{ng} = 


(4.42) 


because na is the summation of many independent random variables. The 
error variance in (4.42) decreases with Lp, M, and q/No. Hence, increasing 
the number of antennas improves the estimation quality of the channel gain 
when using the parametric ML estimator. 
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Even if the angle-of-arrival is estimated imperfectly, we can still use it to 
estimate the channel gain. Unlike the last example, we will directly estimate 
the channel gain instead of estimating its square root. Specifically, we can 
substitute ¢ back into (4.39) and look for the value of 8 that maximizes the 
PDF. Since only two terms in the exponent contain 6, the ML estimate is 
obtained as 

Ly 


a Bi GB ox | cep saM 
 R(FOLE yi) 4.43 
(4.43) 


The solution is obtained by taking the first-order derivative of the exponent 
with respect to 6, equating it to zero, and solving for 8.4 


In summary, the ML estimate of the channel vector is (Bal) and com- 
puted using (4.40) and (4.43). There are many other channel estimation 
methods for LOS channels, including those that can simultaneously identify 
multiple signals arriving from different angles. This is a common problem in 
radar applications. We refer to [51] for a classic overview of such algorithms. 
Section 8.1 describes a few of these algorithms. 


4.3 Modeling of Line-of-Sight MISO Channels 


A MISO channel can be obtained from a SIMO channel by switching the 
transmitter and receiver roles, as discussed in Section 3.3. Figure 4.11 shows a 
general free-space MISO LOS setup of the same kind as in Figure 4.2, but with 
the opposite transmitter/receiver roles. The distances remain the same, with 
dm denoting the distance between the transmit antenna m and the receiver. 
Hence, the SIMO and MISO channels are reciprocal so that the channel vector 
h = {hi,...,h]" is the same in both cases. There is no need to repeat any 
derivations, but we will summarize the results from the last sections. With 
arbitrary antenna locations and the assumptions leading to frequency flatness, 
hm can be computed using (4.10) when antenna 1 is the reference antenna. 
The channel vector becomes 


By 
(dz —d1) 


| V Boe" A 
h = | | = | VBge ine | (4.44) 
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4We implicitly assumed that the estimate B in (4.43) is not larger than 1. This is a meaningful 
assumption since the magnitude of the sample average of the received signals is sufficiently small 
in practice so that 8 < 1. 
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Figure 4.11: A free-space MISO LOS channel where dm is the distance between the transmit 
antenna m and the receive antenna form = 1,..., M. 


If we restrict ourselves to a ULA at the transmitter with the antenna spacing 
A and y being the angle-of-departure from the first transmit antenna to the 
receiver, then we obtain the same geometry as in Figure 4.3, except that the 
transmitter and receiver roles are interchanged. Assuming dı > 2M?A?/,, so 
that the receiver is in the far-field of the ULA, (4.44) simplifies (approximately) 


to 
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` A sin(y) 
hı ela SS 
2A sin(y) 

A 


hes |S |=v8 eama | (4.45) 
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where 8 is the common channel gain of all the antennas. The expression 
depends on the angle-of-departure via the spatial frequency sin (9) /A. This is 
the same spatial frequency as when the array receives a signal from the angle 
p. The expression can be further simplified by setting A = \/2 to obtain the 
typical expression in (4.23) for a half-wavelength-spaced ULA. 


4.3.1 MISO Channel Capacity with ULA 
The capacity of a MISO channel was presented in (3.49) as 


P|ib||? i 
B bit/s. (4.46) 


For the ULA with h given by (4.45), we have ||h||? = M8 independently of 
the antenna spacing and angle. If we substitute this into (4.46), we obtain 


C = Blogs (1+ 


C = Blogs (1 + x) bit/s, (4.47) 
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which is the same as for the corresponding SIMO channel. The SNR is M times 
larger than the corresponding SISO case where only one transmit antenna 
is used. This beamforming gain is achieved by using the M antennas of the 
ULA to focus the transmitted signal on the receiver using MRT. In this case, 
the MRT vector in (3.44) becomes 


1 
P Asi 
ein inle) 
* w 2A si 
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P= Ta = Wa (4.48) 
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All the elements in p have the same magnitude 1/ VM since the channel gain 
is the same for all transmit antennas, a unique feature of LOS channels. Hence, 
MRT consists of dividing the transmit power equally between the M antennas 
and phase-shifting the signals before transmission to make sure that the M 
signal components combine coherently at the receiver; that is, h”p becomes a 
sum of M positive terms, each being equal to y8/M. 

The phase-shifts in MRT actually describe different time delays; antennas 
with slightly longer distances to the receiver will transmit their signals slightly 
earlier so that all M signals are received synchronously. This principle is 
illustrated in Figure 4.12, where two of the antennas must transmit earlier to 
compensate for their longer distances to the receiver. This corresponds to a 
virtual rotation of the ULA by ọ to mimic the situation where the receiver 
is in the broadside direction. The equivalence between phase-shifts and time 
delays appears under frequency flatness. ULAs can also be used for a channel 
that does not satisfy the frequency flatness condition max,, oral < $ 
(e.g., due to a huge bandwidth B or vast distance between the outermost 
antennas), but in this case, we must select the precoding vector differently to 
match the corresponding channel vector. 

An equivalent description is that the channel vector contains the spatial 
frequency sin(y)/A, and the MRT vector must be matched to that frequency to 
ensure that the M signal components combine coherently. This interpretation 
will be instrumental in understanding beamforming from ULAs. 

To achieve the MISO capacity, the transmitter needs to know the channel 
h so that it can compute the MRT vector in (4.48) and the capacity value in 
(4.47) that determines how to encode the data symbols. The receiver needs 
comparably less CSI to decode the transmitted data: it needs to know the 
factor h™p = ||h|| that the data signal is multiplied by in the received signal 
y = h™pz +n in (3.41) and the capacity value. The CSI can be acquired by 
transmitting a preamble, similar to what was described in Section 4.2.4. One 
option is that the multi-antenna transmitter sends multiple preambles in a 
sequence (one from each of the M antennas) and lets the receiver feed back 
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Figure 4.12: When a ULA transmits to a far-field receiver located in a non-broadside direction 
y, all antennas except the reference antenna will phase-shift their signals to ensure they are 
received synchronously at the intended receiver. This is equivalent to a virtual rotation of the 
ULA by the angle ¢ to synthesize that the receiver is in its broadside direction. The coloring in 
this figure represents wave components that are supposed to be received synchronously. 


the channel estimates. Since this requires the receiver to transmit something, 
an alternative is to let the single-antenna receiver transmit the preamble, 
in which case we can precisely follow the approach for SIMO channels in 
Section 4.2.4. The fact that the SIMO and MISO channels are reciprocal 
enables us to send the preamble in any direction. It is typically more efficient 
to send preambles in the SIMO direction since one can estimate the entire 
channel vector from a single preamble, while the MISO direction requires one 
preamble per transmit antenna.” Moreover, it is only the multi-antenna device 


that needs to know the complete channel vector, so it is convenient if it is the 
one that computes the estimate. 


4.3.2 Beamwidth of the Transmitted Signal 


When using MRT in free-space LOS communications, the transmitted signal 
from a ULA takes the shape of a directional beam when measured in the 
far-field. This was illustrated already in Section 1.2.1. Figure 1.17 shows 
beamforming in the direction y = 0 from a ULA with M = 10 antennas and 
A = 3/2, while the corresponding case of p = 7/2 is illustrated in Figure 1.19. 
The equivalent to MRT when using the ULA for reception was also exemplified 
in Figure 4.8, where we noticed that MRC acts as a spatial filter that only 
amplifies signals arriving from the preferred angular directions. 

When the transmitted signal is directed in the angular direction y, a 
receiver located in precisely that direction will obtain a beamforming gain of M. 


5If the parametric ML estimator is used and the SNR is high, it is sufficient to transmit two 
preambles in the MISO direction to estimate the two unknown parameters: channel gain and 
angle. This is, nevertheless, more preambles than in the SIMO direction. 
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Receivers in other nearby angular directions will also achieve a beamforming 
gain, but it is smaller than M. The angular interval where a beamforming 
gain is observed is called the beamwidth. The purpose of this section is to 
quantify the beamwidth for a ULA with A = 4/2. 

We begin by defining the array response vector of dimension M as 


1 
e-i sin() 


—jn2 sin(y) 


am(y) = g (4.49) 
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This is a special case of (4.28), where we considered an arbitrary antenna 
spacing. The array response vector is equal to h/v where h is taken from 
(4.23); thus, it describes the normalized channel to any receiver located in the 
far-field in the angular direction y. The normalization removes 3 from the 
channel expression and thereby eliminates the dependence on the propagation 
distance, which has no impact on the angular properties of the beam. 

Suppose we transmit a signal in the direction Ypeam € |[—7/2,7/2] using 
the MRT vector p = aj;(Ybeam)/||aar(Ybeam)]||, then the array factor observed 
by a receiver located in another direction y € [—7/2,7/2] is aj,(y)p. This 
represents the complex scaling factor the signal will experience compared to 
the single-antenna case. The beamforming gain is the squared magnitude of 
the array factor: 
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where we utilized the fact that ||an(Ybeam)||? = M. The beamforming gain in 
(4.50) is 4| Se 1|? = M for a user in direction Y = Ypeam, as expected. To 
compute the beamforming gain achieved/observed in other angular directions, 
we make use of the summation formula for geometric series: 


M J—a™ P 
f 1 
5 git = l-g ? : x # , (4.51) 
het M, ife=1. 


The summation in (4.50) is a geometric series with 2 = eit 6in(y)—sin(Ybeam)) | 
The case x = 1 occurs when the two angles are equal, Y = Ybeam, and then 
the beamforming gain in (4.50) becomes M.° For any other ọ € [—7/2, 7/2], 


SIf we extend the range to y € [—7, 7), we will also obtain x = 1 for p = T — Ypeam- This 
demonstrates that a ULA with isotropic antennas cannot beamform in one direction without 
also sending a beam in the mirror-reflection direction that was illustrated in Figure 4.7. 
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we have sin(y) Æ sin(Ypeam) (leading to x # 1) and the beamforming gain in 
(4.50) can be rewritten as 
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(4.52) 
where the second equality follows from Euler’s formula: 
jæ ,—jx _f[- —2Qjx 

sin(x) = á va =g 3 ; (4.53) 


This formula is applied in both the numerator and the denominator. In 
particular, we utilize that |1 — e~7*|? = 4| sin (x)|? = 4sin? (x). 

The ratio in (4.52) can be recognized as a squared Dirichlet kernel/function, 
but this terminology from Fourier analysis does not make it easier to grasp its 
behavior. However, it can be well approximated for small angle differences by 
a squared sinc-function. By exploiting the fact that sin? (x) ~ x? for argument 
values close to zero, we obtain” 
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This approximation is tight when the beam angle Ypeam and the observation 
angle ọ are similar, in the sense that sin(y) ~ sin(Ypeam). The argument of 
the sinc-function in (4.54) is the aperture length MA = MX/2 of the ULA 
multiplied by (sin(y) — sin(Ypeam))/A, which is the difference between the 
spatial frequencies of the channel vector and the MRT vector being used. 
The sinc-function attains its largest value when the argument is zero, which 
happens for sin(y) = sin(Ypeam)- The general trend is that the function in 
(4.54)x reduces as the argument attains larger positive or negative values, 
but it also oscillates and has zero-crossings for integer-valued arguments. 
This indicates that the beamforming gain is largest in the intended angular 
direction Y = Ybeam and then reduces in an oscillating manner. 


T Another way to obtain this approximation is to interpret the summation in (4.50) as the left 
Riemann sum of the function e~J7™(sin(¥) —sin(Ybeam)) with unit-sized partitions. By replacing 
it with the corresponding Riemann integral i. e—inm(sin(y)—sin(~beam))m, and computing its 
value, we obtain the final expression in (4.54) after some algebra. 
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Example 4.6. Consider a ULA with M antennas and A = 4/2. Suppose a 
signal is transmitted in the direction Ypeam = 0. Use the exact formula in 
(4.52) and sinc-approximation in (4.54) to determine the beamforming gain 


(a) in the direction y = 7/6 when M = 10; 
(b) in the direction y = 7/60 when M = 10; 
(c) in the direction y = 7/60 when M = 100. 


By inserting the corresponding values into the expressions in (4.52) and 
(4.54), we obtain the exact and approximate beamforming gains as 
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approximate: 100 - sinc (ee) = 1.29. 


(c) exact: = 1.29; 


We notice a large beamforming gain of 7.96 in the direction y = 7/60 with 
M = 10 antennas, while it reduces to 1.29 for M = 100. This is remarkable 
since the maximum beamforming gain equals the number of antennas and 
simultaneously increases from 10 to 100. We notice that the approximate 
beamforming gains are very similar to the exact gains when y is close to 
Ybeam, but can otherwise slightly underestimate the gain. 


To gain further insights, we will analyze the special case when we transmit 
in the broadside direction perpendicularly to the array: Ypeam = 0. It then 
follows from (4.50) and (4.52) that 


n 2 Pa) nm sin(y) 
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The solid line in Figure 4.13(a) shows the beamforming gain in (4.55) that 
is observed for angles y between —7/2 and 7/2 (i.e., from —90° to 90°). A 
plot like this is called the beam pattern. We consider M = 10 antennas, and 
the vertical axis is shown in the decibel scale since the beamforming gain 
variations are substantial. The maximum beamforming gain is 10dB and 
is achieved for y = 0 = Ybeam. This is expected since the ULA focuses its 


230 Line-of-Sight Point-to-Point MIMO Channels 


10 T T 
— Exact : 
cou, leonis Sinc approx Above line: 
fan] Amplification 
ET omens S RE, E EE A A EA 
g 
ia] 
a0 
56 A A 
& -10+ A A J 
z E E EO 
g -20 5 7 4 
D z 
zZ : 
-30 fi fi L L fi 
Sí T T 0 © © T 
2 3 6 6 3 2 
Observation angle y 
(a) Beamforming gain shown using a rectangular plot. 
Observation angle y 0 
T 
6 
T F = 
3 / 3 
ı 
1 
4 
/ 
I 
I \ 
I $ 
T I 1 a 
2 Beamforming gain [dB] -30 -20 -10 0 10: * 


(b) Beamforming gain shown using a polar plot. 


Figure 4.13: The beamforming gain that is observed in different directions y when a ULA 
with M = 10 antennas transmits in the zero-angle direction: Ypeam = 0. The beamforming gain 
is computed using (4.52). The angles are measured in radians, but the scale is easy to convert 
to degrees since 7/6 is 30°, 7/3 is 60°, and 7/2 is 90°. 
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Figure 4.14: A typical beam pattern and summary of the related terminology. 


signal in that direction and 10 log; (JZ) = 10dB. The beamforming gain 
gradually reduces when y is changed, and after a while, it drops below 0dB. 
The dashed horizontal line in Figure 4.13(a) corresponds to 0dB. When the 
beamforming gain is below this line, the signal is not amplified by the ULA 
but attenuated compared to the transmission from a single isotropic antenna. 
This demonstrates that beamforming does not create signal power but merely 
redistributes power between different angular directions. The beamforming 
gain oscillates in the left and right parts of the beam pattern, but the general 
trend is that it decreases as |p| increases. The dotted curve is computed 
using the sinc-approximation from (4.54). This approximation has the correct 
zero-crossings but underestimates the maximum gains of the oscillations when 
y is much different from Ypeam- 

Figure 4.13(b) shows the same beam pattern using a polar plot. This type 
of plot gives a better visualization of the beam directivities since each beam 
points in its actual angular direction seen from the origin. To achieve this, the 
angles are presented in the opposite order compared to Figure 4.13(a). The 
same nine beams can be observed in both cases: A strong main beam is in 
the middle (around y = 0) and four side-beams on each side. We will refer to 
the latter as side-lobes to reserve the word “beam” for the intended direction. 
The beamforming gain is precisely zero in between these beams, and those 
points are called nulls. The null locations and the angular widths are easier 
to measure and compare using the rectangular plot in Figure 4.13(a) because 
the strength of a beam does not affect how wide it appears in the plot. Hence, 
we will mainly consider rectangular plots in this book. We summarize this 
terminology in Figure 4.14. 
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We want to characterize the width of the main beam because it is within 
this interval that the beamforming gain is large. The beamwidth can be 
defined in several different ways. The half-power beamwidth is the width of 
the angular interval in which the beamforming gain is between M and M/2. 
This definition quantifies the angular interval where the beamforming gain is 
close to the maximum gain. It is also known as the 3dB-beamwidth since a 
loss of 1/2 in linear scale can also be expressed as 10 log; (1/2) ~ —3 dB. 


Example 4.7. What is the half-power beamwidth when a ULA transmits in 
the direction Ypeam = 0? Utilize the sinc-approximation and the fact that 
sinc?(0.443) ~ 5. 

Under the sinc-approximation, the lower and upper limits of the half-power 
beamwidth are obtained by equating the beamforming gain in (4.54) to M/2: 


Msi M M si 1 
Msinc? ( aw) = 5 sinc? (2) =o (4.56) 


Using the facts that sinc(0.443) ~ Z and sinc(—0.443) ~ ve we obtain the 


lower and upper limits as 


= ~ 40.443 & ọ ~ arcsin (=) (4.57) 


The half-power beamwidth is the difference between these limits and becomes 


approximately eee 
2 in | —— 4.58 
arcsin ( M ) : (4.58) 


which decreases when M increases. When M is large, we can utilize the Taylor 

approximation arcsin(x) ~ x that holds for x ~ 0 to simplify the half-power 

beamwidth to 2-0.886/M = 1.772/M. The considered beam transmission is 

matched to a channel vector containing the spatial frequency sin(Ypeam)/A = 0 

but will provide substantial beamforming gains over channels with spatial 
0.886 0.886 


frequencies in the interval -57 an, | 


Figure 4.15 shows the main beam from Figure 4.13. The half-power 
beamwidth is indicated, as well as two alternative beamwidth definitions. 
Another option is determining the angular interval within the main beam, 
where the beamforming gain is above 0dB. We call this the amplification 
beamwidth. This definition quantifies the angular interval where the received 
power is larger than when transmitting from an isotropic antenna. 

One can also measure the total width of the main beam, which we call the 
first-null beamwidth. The benefit of this definition is that it is relatively easy 
to compute an exact analytical expression, while the drawback is that the 
beamforming gain is tiny at the edges of the main beam. The expression in 
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Figure 4.15: There are three possible beamwidth definitions, which have different benefits 
and drawbacks. This figure illustrates what is measured with these definitions in the setup 
considered in Figure 4.13. 


(4.55) contains the ratio between two sine functions that depend on y. The 
lower and upper limits of the main beam occur when the numerator is zero 
while the denominator is not. This happens for angles y such that 


mtn) = nr (4.59) 


for some integer n # 0. The range of solutions to (4.59) is limited by the fact 
that sin(y) € [-1,1]. This implies that there are nulls at the M angles 


2n 
= in | — 4.60 
y = arcsin (57) (4.60) 
for n = +1,+2,... ,+|¥], where |-| rounds the argument to the closest 


smaller or equal integer. The nulls that specify the left and right limits of the 
main beam are given by n = +1, for which (4.60) reduces to 


iment = (4.61) 
= + arcsin | — J. ; 

€ M 

Hence, the first-null beamwidth is 2 arcsin(-Ẹ). If M > 5, we can utilize the 


Taylor approximation arcsin(x) ~ x, which is very tight for x € [0, 0.4], to 
conclude that the lower and upper limits of the main beam are 


2 
zx +—. 4.62 
p~ ET (4.62) 
Hence, the width of the main beam is approximately 4/M radians, which can 


also be expressed as (180/7) - (4/M) = 720/(Mr) degrees. This expression 
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for the first-null beamwidth is more than twice as large as the approximate 
half-power beamwidth 1.772/M that was derived in Example 4.7 (as can also 
be seen in Figure 4.15). However, the two beamwidth definitions share the 
following general behavior: the beamwidth is inversely proportional to the 
number of antennas M. The more antennas are used, the narrower the beams 
will be, which has two benefits: the receiver obtains a stronger signal, and 
less interference is transmitted in other non-intended directions. This is yet 
another reason for using many antennas in wireless communications. 

We can also measure the purity of the beamforming directivity by compar- 
ing the gain of the main beam with the peak gains of the largest side-lobes. 
As noticed earlier, the main beam has a beamforming gain of M. The sinc- 
approximation in (4.54) implies that the peak of the first lobe has a beam- 
forming gain of Msinc?(3/2) = M(~)? since a sinc-function has peak values 
roughly at 0, +3, +3, .... Hence, the main beam is roughly (32)? zx 13.5dB 
stronger than the largest side-lobe. This ratio is independent of the number 
of antennas; thus, we can shrink the beamwidth by adding extra antennas, 
but it will not reduce the relative strength of the side-lobes. Following the 
same approach, we can conclude that the main beam always has a gain that is 
roughly (22)? ~ 17.9dB stronger than the gain of the second largest side-lobe. 

In addition to communication applications, ULAs have been considered for 
radar applications for many years. The goal is then to detect the angle of a 
target (e.g., a vehicle) using methods similar to the angle-of-arrival estimation 
described in Section 4.2.5. In these cases, it is not only the SNR that matters, 
but the beamwidth determines the spatial resolution of the array, also known 
as the angular resolution. For example, Figure 4.10 showed how the utility 
function in ML estimation looks like a beam around the correct angle. The 
width matches the beamwidth; thus, more antennas lead to a smaller width, 
resulting in better estimation accuracy. In radar applications, detecting two 
targets with an angle difference smaller than the beamwidth is hard because 
they appear as a single target with a somewhat larger size. Similarly, if we 
want to transmit communication signals to two LOS receivers simultaneously, 
the mutual interference is small if their angle separation is larger than the 
beamwidth. We will consider localization and sensing in Chapter 8. 

Figure 4.16 considers the same setup as in the previous figure, except that 
we are now comparing M = 10 and M = 20 to show how the beamwidth 
shrinks as we increase the number of antennas (irrespective of which beamwidth 
definition we consider). When having M = 20 antennas, we get roughly half 
the beamwidth compared to the case of M = 10. The width of the side- 
lobes shrinks similarly, which also means there are more side-lobes. Since 
we assumed an antenna spacing of A = 4/2 in this section, increasing the 
number of antennas is equivalent to making the ULA wider. If we generalized 
the results to consider other antenna spacings, we would observe that the 
aperture length of the ULA determines the beamwidth and not the number 
of antennas. We will return to this in Section 4.3.4. 
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Figure 4.16: Comparison of the beamforming gains with M = 10 and M = 20 in the same 
setup as in Figure 4.13. 


The beamwidth also depends on which angular direction we point the 
beam in. If we consider an arbitrary beam direction Ypeam € [—7/2, 7/2], 
then the nulls appear when the sine function in the numerator of (4.52) is 
zero while the denominator is non-zero. This happens for angles y such that 

T (sin(y) — sin(Ypeam)) 


M 7 =NnT (4.63) 


for non-zero integers n satisfying — Y (1+sin(Ypeam)) <n< M (1—sin(Ybeam)): 


The limits of the main beam are usually obtained by n = +1, which results in 
. [2 ; 
g = + arcsin & + sin(Pbeam) ) : (4.64) 
arcsi a sin( ) (4.65) 
= — arcsin | — — sin(Ypeam ; 
P M Ph 
The first-null beamwidth is the difference between (4.64) and (4.65): 
2 2 

arcsin & + sin( Yea) ) + arcsin & = sin( Pea) ) ; (4.66) 


This is an increasing function of | sin(Ypeam)|, as can be proved by showing 
that its first-order derivative is positive for sin(Ypeam) > 0 and noting that it is 
a symmetric function of sin(Ypeam). Hence, the beamwidth gradually increases 
as the beam direction is changed from the broadside direction Ypeam = 0 to 
the end-fire direction Ypeam = +7/2. In other words, the angular resolution 
is worse in the vicinity of the end-fire direction because the spatial frequency 
sin(Ypeam)/A varies slowly with the beam angle in these situations. 
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Figure 4.17: The beamforming gain that is observed in different directions y when a ULA with 
M = 10 antennas transmits a beam in the directions Ypeam = 7/3, Poeam = 7/4; Or Ybeam = 0. 


When the beam direction is close to 7/2 or —7/2, it happens that the 
main beam is divided within the interval [—7/2, 7/2] so that one part appears 
close to —7/2 and the other part is close to +7/2. In this case, the interval 
-4(1 +sin(Ypeam)) < N < “a —sin(Ypeam)) either contains only positive 
integers or only negative integers. The smallest and largest n then give the 
nulls of the main beam in the interval. 


Figure 4.17 shows the beamforming gains that can be observed in different 
directions when the beam is transmitted in directions Ypeam = 7/3 radians 
(60°), Pbeam = 7/4 radians (45°), or Ybeam = 0 (the broadside direction 
as in the previous figures). As expected, the beamwidth is smallest when 
transmitting in the broadside direction, while it grows when we increase |Ypeam| 
towards any of the end-fire directions +7/2. The main beam is divided into 
two pieces when Ypeam = 7/3, of which the majority appears in the right part 
of the figure and a small piece appears to the left. The maximum beamforming 
gain is equal to M in all three cases, but the shape of the signal leakage in 
other directions is different. 


The wider beamwidths obtained with Ybeam = 7/3 and Ypeam = 7/4 might 
give the impression that there is more signal power in these cases (e.g., the 
areas under the curves are larger). However, the precise interpretation is that 
a larger fraction of the signal power is radiated into the horizontal plane than 
with broadside beamforming. To demonstrate this property, Figure 4.18 shows 
the beam patterns for Ypeam € {0, 7/4} in all three dimensions. The ULA is 
deployed along the y-axis and the xy-plane is the horizontal plane; thus, it is 
the beam patterns along the dotted curves shown in Figure 4.17. Note that 
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(b) Beamforming in the azimuth direction Ypeam = 7/4. 


Figure 4.18: The beamforming gain observed in different 3D directions when a ULA with 
M = 10 antennas is deployed along the y-axis. Beamforming in two different azimuth directions 
is considered, and the dotted curves show the gain variations in the horizontal plane. These are 
the same beam patterns as shown in Figure 4.17. 
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the beam patterns are invariant if we rotate them around the y-axis; thus, the 
beam points in the desired direction in the azimuth plane and many other 
directions with non-zero elevation angles. The beamwidth in the horizontal 
plane is indeed smaller with Ypeam = 0 than with Yream = 7/4, but this is 
compensated for since the beam covers more area in the elevation dimension 
in the former case. The average beamforming gain over the sphere is 1 in both 
cases. Since the beamwidth in the elevation dimension is broad, covering all 
elevation angles, the beams created by a ULA have roughly the same shape 
as orange slices. 

It is a common practice to deploy ULAs so that most of the intended 
receivers are located close to the broadside direction, where the beams are 
sharper (smaller beamwidth). This feature is essential in radar applications 
where one can detect targets in different angular directions if the main beams 
leading to those targets are non-overlapping. Since the total signal power is 
constant irrespective of the value of Ypeam, having a sharp main beam in the 
desired plane leads to the main beam extending into other dimensions and/or 
the side-lobes becoming larger. In wireless communications, all the signal power 
that does not reach the desired receiver can cause interference to other receivers, 
depending on where those receivers are. Which type of angular beam pattern 
causes the least interference varies depending on the deployment scenario and 
distribution of users over the propagation environment. If the users are closely 
located, the beamwidth should be small so that the non-intended user is not 
within the main beam. However, if the users have well-separated angles, the 
beamwidth can be broad because the side-lobes anyway cause all interference. 


Example 4.8. An M-antenna ULA with A = \/2 transmits with MRT toa 
receiver in the direction Ypeam = 7/3 (60°). An unintended receiver is in the 
direction y = 137/36 (65°). How many antennas are needed to ensure the 
unintended receiver is outside the half-power beamwidth? 

We must find how many antennas are needed to achieve a beamforming 
gain smaller than M/2 at the angle y of the unintended receiver. By using 
the sinc-approximation in (4.54), we can express this condition as 


M (sin(y) 5 sin (Pbeam)) 
2 


M 1 
Msinc? ( ) aa > sinc? (0.02014 - M) < 5 (4.67) 
since (sin(y) — sin(Ybeam))/2 ~ 0.02014. We recall from Example 4.7 that 
sinc?(0.443) ~ 4 and notice that sinc(z) is a decreasing function for x € 


2 
[0, 0.443]. This implies that (4.67) can be rewritten as 


0.02014- M > 0.443 = M > 21.996. (4.68) 


Since the number of antennas must be an integer, we need at least 22 antennas 
to ensure that the unintended receiver is outside the half-power beamwidth. 
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The beamwidth concept is most easily explained when transmitting from 
a ULA but also exists for arrays with other geometries. In the case of a SIMO 
channel, the counterpart for signal reception using a ULA is the spatial filtering 
illustrated in Figure 4.8. If MRC is used to coherently combine the signals 
from a transmitter with angle-of-arrival y = 0, then the beamwidth defines 
the angular interval around y = 0 for which other incoming (interfering) 
signals will also be partially coherently combined. 


4.3.3 Grid of Orthogonal Beams 


The beamwidth demonstrates how beamforming focuses the emitted power 
into a limited angular interval. A receiver in other directions will be reached by 
comparably less power and possibly zero power if it is located in a null direction. 
This is a desired feature when transmitting data to a known receiver in a 
known direction, but it is problematic when the goal is to broadcast signals to 
unknown receivers in unknown directions. For example, a cellular base station 
must occasionally announce its existence by broadcasting common messages 
over its entire coverage area to tell prospective user devices how to connect to 
the base station, thereby becoming one of the receivers with a known direction. 
In 5G, the broadcasting starts with the primary synchronization signal (PSS). 
Generally speaking, we need a procedure to reach prospective users with a 
relatively high beamforming gain without knowing their locations. To this 
end, we can preselect a collection of beams and transmit the same common 
message through each of them. This procedure is called beam sweeping. We 
want to select the beam directions to ensure that any prospective user is 
located within the main beam of at least one of the beams in the collection. 

Consider the MISO channel to an unknown receiver, represented by an 
unknown M-dimensional vector h in the vector space C™. Any non-zero 
vector can be written as a linear combination of M orthonormal basis vectors 
by,..., bm € CM: 

h = c&bı +...+cybm, (4.69) 
where at least one of the scalar coefficients c1,...,Cm is non-zero. If we use 
the conjugate of the basis vector b; as the precoding vector, the SNR at the 
unknown receiver will be proportional to 


Ih" bf]? = [(c1b] +... + exrb4,) bf? = |cil?, (4.70) 


where we utilized that orthonormal vectors satisfy b7 bł = 1 and bj, bž = 0 
for m # i. It is plausible that only one of the coefficients in (4.69) is non-zero 
for a particular receiver; thus, we will have to transmit beams using all M 
basis vectors to ensure that the SNR is non-zero for at least one beam. The 
conclusion is that we need a collection of M beams to reach all users and that 
those beams should constitute an orthonormal basis. However, many different 
orthonormal bases can be created. One option is to utilize the columns of 
the identity matrix Iņ as the basis vectors, which effectively means that 
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we will transmit from one antenna at a time. This is undesirable because 
there will never be any beamforming gains, but each antenna will spread a 
relatively weak signal over the coverage area. Instead, we want to divide the 
coverage area into M subareas (i.e., angular sectors) and let each basis vector 
beamform towards one of these subareas. We will describe how to do that in 
the situation considered in the previous sections: the transmitter is equipped 
with a ULA with A = \/2 as the antenna spacing, and there are far-field 
free-space LOS channels to all the prospective user locations. 

If we transmit a beam in the broadside direction Ypeam = 0, then we recall 
from (4.60) that there are nulls at the angles 


2n 
= i — 4.71 
y = arcsin (Fr) (4.71) 
for n = +1,+2,... +4]. If we substitute these angles into (4.49), we can 


find the array response vectors that are orthogonal to the one obtained by 
Ybeam = 0. If we normalize these vectors to have unit length, as we normally 
do when selecting precoding vectors, we obtain 


1 1 
etn vty 
ee (arcsin 63) = = ein2n $r = =n UM 
VM M VM VM : ' 
en in(M-1)n% yin 
(4.72) 


where the last expression uses the notation umy = e/?*// that was first 
introduced in (2.198) when defining the DFT matrix. Several observations 
can be made by inspecting (4.72) and comparing it to the M x M DFT 
matrix F m. Firstly, the vector with index n contains samples of the complex 
exponential v?, = e~/2"4arx with the spatial frequency 7. The distance 
between the samples is the antenna spacing A = \/2. Secondly, the vectors in 
(4.72) for the positive values n = 1,...,|4¢| are columns of the DFT matrix. 
Thirdly, the vectors in (4.72) for the negative values n = —1,—2,...,—|4| 
can be rewritten to only contain positive exponents of vm. The key is to 
utilize the property uj} = 1 to rewrite expression as vt, = yan for these 
negative values of n. Hence, all vectors with negative values of n are identical 
to the last |4 | columns of the DFT matrix. If M is even, then n = M/2 and 
n = —M/2 result in the same precoding vector. The convention is to only 
consider n = —M/2 in this case. Finally, the precoding vector for broadside 
transmission contains only ones, just like the first column of the DFT matrix. 

The conclusion is that the columns of the M-dimensional DFT matrix 
contain a collection of M beams, each associated with beamforming in a 
distinct angular direction. This is called a grid of beams since it is obtained by 
sampling the range of azimuth angles y to obtain M grid points. The points 
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are not equally spaced in the angular domain [—7/2, 7/2] but in the spatial 
frequency domain [—1/A,1/A] to take the angle-dependent beamwidths into 
account; we need more beams close to the broadside direction and fewer beams 
close to the end-fire directions. We identified this particular grid of beams by 
starting from the first column of the DFT matrix (i.e., beamforming in the 
broadside direction) and noticing that the other columns of the DFT matrix 
are orthogonal to it and, thus, can be used to beamform in its null directions. 
However, we also know that the DFT matrix is unitary, which implies that all 
columns are mutually orthogonal and constitute an orthonormal basis in C™. 
This implies that each column will form a main beam centered at a null of all 
the other beams. It is, therefore, appropriate to call this a grid of orthogonal 
beams to distinguish it from other prospective collections of beams. It is also 
known as the DFT beams for apparent reasons. 

Figure 4.19(a) shows the ten orthogonal beams obtained from the DFT 
matrix when having M = 10 antennas. The seven beams in the middle have 
similar beamwidths and angular separation, while the outermost beams are 
substantially broader and, thus, more spread out. This aligns with the previous 
beamwidth discussion and demonstrates how a ULA has a worse angular 
resolution close to the end-fire directions. In fact, the outermost beam is split 
into two parts, of which one-half points to the left and the other half points 
to the right. Figure 4.19(b) shows the same ten beams but considering the 
spatial frequency sin(y)/A on the horizontal axis, in which case all the beams 
become equally wide. The spatial resolution of a ULA is fundamentally a 
spatial frequency resolution, as we have hinted earlier in this chapter. The 
maximum beamforming gain is 10 dB for all the beams because M = 10. The 
neighboring beams intersect at the point where the beamforming gain has 
reduced by almost 4dB to around 6dB. Hence, if the considered multi-antenna 
base station broadcasts a common message by repeating it ten times using 
these different beams, any LOS user is guaranteed a beamforming gain of at 
least 6dB, although the beams were not adapted to any user location. 

The loss in beamforming gain remains around 4 dB at the intersection point 
between two adjacent DFT beams, even if we increase the number of antennas. 


To prove this, we consider the adjacent beams in the directions arcsin( 2”) 


M 
and arcsin(24+*) for some feasible n. The intersection point between these 


beams is at the angle arcsin(#4++).° If we substitute the first beam and the 


intersection point into (4.52), we obtain the beamforming gain 


sin? ( 238-9) 
1 2 — 1 sin? (3) 1 >a (2) 
wn? (E) Min? (afr) Msi) © 7 


(4.73) 


8The intersection point is found by comparing (4.52) for the two beam directions and 
identifying the value of y where the expressions are equal. 
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(a) Grid of orthogonal beams shown using a polar plot as a function of the azimuth observation angle 
p. 
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(b) Grid of orthogonal beams as a function of the spatial frequency. 


Figure 4.19: The beam patterns of the grid of ten orthogonal beams obtained from the columns 
of the DFT matrix when using a ULA with M = 10 antennas. The angles in (a) are measured 
in radians, but the scale can easily be converted into degrees since 7/6 is 30°, 7/3 is 60°, and 
T/2 is 90°. An equivalent representation as a function of the spatial frequency is given in (b) to 
showcase that all beams are equally wide in that domain. 
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where the last step follows from that sin?(r) < x?. The lower bound is 
approached when M is large, as seen from making a first-order Taylor ap- 
proximation of the sine function. This reveals that the reduction in beam- 
forming gain at the intersection point between two DFT beams is at most 
20 log, )(2/7) ~ —3.9dB and reaches this value when M grows large. In sum- 
mary, when a ULA with M antennas transmits a grid of orthogonal beams, all 
LOS users are guaranteed to find one beam where it achieves a beamforming 
gain of at least M(2/7)?. 


4.3.4 Impact of Aperture Length and Antenna Spacing 


The beamwidth analysis in the previous sections shows how the directivity 
of the radiation pattern can be controlled under the assumption of a ULA 
with antenna spacing A = /2. If the antenna array has a different geometry, 
the radiation patterns that beamforming creates will be different—a more 
irregular geometry will result in a more irregular pattern. In this section, we 
will still consider ULAs but determine the impact of the antenna spacing. 
Recall from (4.19) that the LOS channel vector with an arbitrary antenna 
spacing A and wavelength à can be expressed as h = \/8a(y), where the 
array response vector for the angle-of-departure y is 
1 
en ian SS 


z 2A si 
e-j27 inle) 


a(p) = (4.74) 


e-j27 ma sin(y) 

If we transmit a signal in the direction Ypeam € [—7/2, 7/2] using the MRT 
vector p = a*(Ypeam)/||a(Ybeam)||, then we can follow the approach in (4.50)— 
(4.52) to determine the beamforming gain that is observed by a receiver 
located in any direction y € [—1/2, 7/2]: 

2 


M 
TC) *(Pheam) rap e jA (sin(y) sin(Ybeam)) 
ia (Pheam) Il 
M, if a (sin(y) — sin(Ybeam)) is an integer, 
_ sin2 M rast tha) 


otherwise. 


A sin? ( nA (sin(e)—sin(Ppeam)) ) ? 
(4.75) 


The last equality follows from the summation formula for geometric series in 
(4.51) and Euler’s formula. The first row in (4.75) shows that the maximum 
beamforming gain is M and is achieved for Y = Ypeam (the intended beam- 
forming direction) because in that case, we have Ê (sin(y) — sin(Ybeam)) = 0. 
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There might be additional values of y for which a (sin(y) — sin(Ybeam)) be- 
comes and integer. Since the maximum gain is M, it depends on the number 
of antennas but not the antenna spacing. The reason is that MRT ensures that 
all antennas’ signal components superimpose constructively in the desired 
angular direction, irrespective of the array geometry. The second expression in 
(4.75) characterizes the beamforming gain in other directions, and it depends 
on the normalized antenna spacing relative to the wavelength, which we will 
denote in this section as 


The numerator of the second expression in (4.75) also contains the product 
between the number of antennas and the normalized antenna spacing, which 
we will further denote as 
MA 
Dy = =a MAN (4.77) 
This is the normalized aperture length, according to Definition 4.1, which is 
the physical aperture length MA of the ULA normalized by the wavelength. 
To analyze how the beamforming gain depends on A), D), and the observation 
angle y, we first introduce the variable 


® = sin(Ybeam) — sin(y). (4.78) 


For a given value of Ybeam, Only the range ® € [sin(Ypeam) — 1, sin(Ybeam) + 1] 
can be achieved since sin(y) takes values between —1 and 1. When considering 
all possible beamforming directions, we should consider values of ® from —2 
to 2. We notice that ®/A is the difference in spatial frequency between beam 
direction and observation direction. 

The beamforming gain in (4.75) can be expressed as a function of ® as 


: TA 
1 sin? (M42) _ 1 sin? (1D) 
M sin? (242) ~ M sin? (tA) ©)’ 


A(®) = (4.79) 


The squared sine-function is a periodic function that repeats when the argu- 
ment changes by +7. This implies that the numerator in (4.79) is a periodic 
function of ® with period 1/D) and the denominator is periodic with a period 
of 1/A). The numerator varies M times faster than the denominator because 
D)/A = M; thus, A(®) has a period of 1/A). We also have that 


A(F)=0, m=+1,...,4(M—1), (4.80) 


since the numerator is zero while the denominator is non-zero at these points. 
These values correspond to the nulls in the beam pattern. In particular, 
the main beam around ® = 0 has its nulls at +1/D); thus, the first-null 
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beamwidth only depends on the normalized aperture length. The larger the 
aperture, the smaller the beamwidth, irrespective of whether the aperture 
is achieved using many antennas with small spacing or few antennas with 
large spacing. At the points A(0) and A(+1/A)), where the beam pattern 
repeats itself, the numerator and denominator in (4.79) are both zero, which 
makes the function seemingly undefined. However, the limit value is M, as 
in the first row of (4.75) that represents the maximum beamforming gain. 
For brevity, we will not write that out explicitly when analyzing A(®), but 
remember that it is indeed a well-defined continuous function of ®. 

Figure 4.20 shows A(®) in dB-scale for a ULA with M = 10 antennas and 
an antenna spacing of A, = 1/2 wavelengths, which results in a normalized 
aperture length of D, = 5 wavelengths. The purpose of this figure is to 
illustrate how the beam patterns for different values of Ypeam are obtained 
from A(®) for ® € [sin(Ypeam) — 1, sin(Ybeam) + 1]. In the upper part of the 
figure, the red dash-dotted curve is obtained by beamforming directed in the 
broadside direction Ypeam = 0, while the dotted green curve is obtained by 
beamforming in the end-fire direction Ypeam = 7/2. These curves are drawn 
as a function of y but are each obtained by taking the indicated intervals of 
A(®) in the lower part of the figure and “stretching” them out over all angles. 
The mapping of the horizontal axes is non-linear since ® = sin(Ypeam) — Sin (p) 
contains the sine-function, but the shape along the vertical axis is unchanged. 

The first-null beamwidth is 2/D), when considering A(®). In case of 


Ybeam = 0, we have y = —arcsin(®) and, thus, the beamwidth becomes 
2 i : z di 4.81 
arcsin wi radians (4.81) 


when expressed in terms of the observation angle y. The approximation in 
(4.81) is tight for Dà > 2.5 since arcsin(x) ~ x holds very well for x € 
[0,0.4]. Hence, for arrays with aperture lengths beyond a few wavelengths, the 
beamwidth becomes inversely proportional to the aperture length (irrespective 
of the antenna spacing). The beamwidth widens as Ypeam increases and reaches 
its maximum in the end-fire direction. However, the heights of the main beam 
and side-lobes are the same in all these cases. 

Since A(®) has a period of 1/A) = 2, the main beam at © = 0 repeats 
itself at 6 = +2. This explains why the main beam is divided into two pieces 
on the green curve that utilizes the range ® € [0,2]. This was previously 
observed in Figure 4.19(a), where beamforming in one end-fire direction also 
resulted in a beam pointing in the opposite end-fire direction. 

Using the established connection between A(®) and the beam patterns, 
we will further study how the antenna spacing affects A(®). Figure 4.21 
shows A(®) for a ULA with a normalized aperture length of Dy = MA) =5 
wavelengths, but three different configurations: 


1. M = 20 with A) = 1/4 wavelengths; 
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Figure 4.20: The function A(®) in (4.79) is shown in the bottom figure for M = 10 antennas 
and the antenna spacing A, = 1/2 wavelengths. Depending on the beamforming direction 
Ybeam, we take a certain interval ® € [sin(Ypeam) — 1,sin(Ybeam) + 1] from A(®) and use 
it to generate the resulting beam pattern at the top. The horizontal axis is stretched since 
® = sin(Ypeam) — sin(y).This is illustrated for Ybeam = 0 and Ybeam = 7/2. 
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Figure 4.21: The function A(®) in (4.79) is shown for three different ULAs with a normalized 
aperture length of Dà = 5 wavelengths. The first-null beamwidth and locations of other nulls 
are the same in all three cases, but the sizes of the side-lobes vary. 


2. M = 10 with A) = 1/2 wavelengths; 
3. M = 5 with A) = 1 wavelengths. 


The second configuration is the half-wavelength-spacing case considered in 
Figure 4.20 and previously in this chapter. We will compare it to the first 
and third configurations. If we double the number of antennas to M = 20 
while reducing the antenna spacing, Figure 4.21 shows that the maximum 
beamforming gain of the main beam at ® = 0 is doubled, but the first-null 
beamwidth remains unchanged. This aligns with our analytical observation in 
(4.80) that only the normalized aperture length determines the null locations. 
The heights of the side-lobes are changed, but the number of side-lobes 
and their respective widths are identical. Recall from Figure 4.20 that the 
maximum beamforming gain reappears around ® = +2 since A(®) has a 
period of 1/A) = 2. This phenomenon disappears when the antenna spacing is 
reduced to Ay = 1/4 because A(®) has a period of 1/A) = 4 in that case, and 
we consider a smaller interval. Hence, we can now beamform in one end-fire 
direction without creating a beam in the opposite end-fire direction. 

In the case of M = 5, A(®) has a period of 1/A) = 1; thus, the side-lobes 
at 6 = +1 are equally strong as the main beam. These are called grating lobes 
and show that the array cannot distinguish between specific angular directions 
when the antenna spacing is A. Apart from this phenomenon, the first-null 
beamwidth and the locations of the other side-lobes remain the same since 
these are only determined by the normalized aperture length of the ULA. 

Recall that the classical sampling theorem in Lemma 2.8 states that a 
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signal must be sampled at least twice per period (of its largest frequency) to 
be uniquely distinguishable. We normally apply this theorem by letting the 
same device take samples at regular time instances. However, since a wireless 
signal propagates over the wireless medium, we can also take samples at the 
same time but at different spatial locations. The latter is what an antenna 
array does during reception. The array response vector a(y) in (4.74) contains 
the entries 


ean Am im —0,...,M—1, (4.82) 


which are obtained simultaneously by a ULA with the spatial antenna spacing 
A. The same entries could alternatively be obtained as samples taken once 
every A seconds from a complex exponential with the frequency sin(y)/A Hz. 
When the wavelength is A, the largest observable spatial frequencies (in the 
magnitude sense) are +1/. The ULA will observe a complex exponential 
with those spatial frequencies when the signal impinges from the end-fire 
directions p = +7/2. Since the period is A in this case, the sampling theorem 
dictates that complex exponentials with spatial frequencies in [—1/A,1/) can 
only be uniquely distinguished from their samples if A < \/2 (i.e., sampling 
twice per period). In analogy to sampling at the Nyquist rate, a ULA with 
A) = 1/2 is called a critically spaced array. A sparsely spaced array with 
A) > 1/2 performs spatial undersampling and gives rise to grating lobes, 
which is a kind of spatial aliasing where some widely different directions are 
indistinguishable. A densely spaced array with A) < 1/2 performs spatial 
oversampling, which cannot increase the spatial resolution, just as oversam- 
pling of a time-domain signal does not resolve any ambiguities since those 
disappear at the Nyquist rate. However, oversampling increases the maximum 
beamforming gain proportionally to M for a given aperture length. 

A subtle but important point is that a critically spaced array cannot 
distinguish between —1/A and 1/A, which is why the same DFT beam covers 
both end-fire directions in Figure 4.19. This is because the sampling theorem 
requires the spatial bandwidth to be strictly smaller than 2/, (i.e., equality 
is not permitted) when sampling at the spatial Nyquist rate of 1/A = 2/2 
(i.e., A = \/2). This issue can be disregarded when the ULA uses directive 
antennas that cannot receive anything from the end-fire directions, as is the 
case for the cosine antenna in (1.34), 

We will now let the number of antennas be fixed but vary the antenna 
spacing. Figure 4.22(a) shows A(®) with M = 10 and either A, = 1/2 
(critically spaced) or Ay = 1/4 (densely spaced). The aperture length is smaller 
in the latter case since the number of antennas is fixed. The widths of the 
main beam and side-lobes increase when the antenna spacing is reduced; recall 
that the distance between the nulls in (4.80) increases when the normalized 
aperture length shrinks. The opposite result is seen in Figure 4.22(b), where 
we compare A) = 1/2 (critically spaced) and A) = 1 (sparsely spaced). A 
larger antenna spacing results in a narrower main beam but also gives rise to 
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(b) Comparison between critically spaced and sparsely spaced arrays. 


Figure 4.22: The function A(®) in (4.79) is shown for ULAs with M = 10 antennas, but 
either critical spacing (A = X/2), dense spacing (A = A/4), or sparse spacing (A = A). A larger 
antenna spacing leads to smaller beamwidth but will also give rise to grating lobes when the 
spacing is larger than \/2. 
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grating lobes. The total width of the main beam and grating lobes are always 
the same. Grating lobes are undesirable if we want to extract information 
from the channel, such as determining the angle to the receiver, because we 
cannot distinguish whether the received signal power is strong because the 
main beam points to the receiver or one of the grating lobes. This ambiguity 
is primarily a concern in radar and not in communications, where we can even 
benefit from the fact that the main beam is narrower when there are grating 
lobes—it gives a higher spatial resolution around the intended beamforming 
direction, which might improve the ability of spatial multiplexing. 


Example 4.9. Consider a ULA deployed vertically in a mast to serve user 
devices on the ground. The potential users are located in the angular interval 
[0,7/2], so no grating lobes are allowed in this interval when beamforming 
towards the users. How should the antenna spacing be selected for a given 
number of antennas, M, to minimize the beamwidth? 

The beamwidth is inversely proportional to the normalized aperture length 
Dy = MA, . Since M is fixed, the beamwidth can be minimized by selecting 
the largest permitted antenna spacing A). The function A(®) in (4.79) is 
periodic with period 1/A), so we need |®| < 1/A) for all the values of 
® = sin(Ybeam) — sin(y) that appear in this deployment scenario. We might 
send a beam to a user device in any direction Ypeam € [0,7/2] and have 
prospective receivers in any direction y € [0,7/2]. Hence, we require that 


1 

max sin — sin eae 4.83 
Pbeam,PE[0,7/2] [sin(Pbeam) (p) < D ( ) 
Since the sine function takes values between 0 and 1 in the given interval, the 


maximum difference is 1. As a result, we need to guarantee that 


La = => A) <1 wavelength. (4.84) 
à 
In conclusion, we achieve the smallest beamwidth (highest spatial resolution) 
with an antenna spacing of one wavelength. This sparsely spaced array achieves 
beamwidths roughly half as wide as with the corresponding critically spaced 
array. The price to pay is that grating lobes are sent into the sky, but this 
will not cause interference to the users on the ground. 


The conclusion is that A = à/2 is often the preferred antenna spacing 
because, for a given number of antennas, it gives the smallest beamwidth 
achievable without grating lobes (i.e., spatial aliasing), except in the end-fire 
directions. This is why we considered this spacing earlier in the chapter and 
will continue doing so in the remainder of this book. However, in situations 
where grating lobes are acceptable in certain angular intervals, increasing the 
antenna spacing to reduce the beamwidth in other intervals can be desirable. 
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4.4 Modeling of Line-of-Sight MIMO Channels 


We will now reuse the analysis from the SIMO and MISO cases to characterize 
the point-to-point MIMO channel matrix H. We assume there are K transmit 
antennas and M receive antennas. We let dm, denote the distance between 
the transmit antenna k and receive antenna m. A detailed derivation of 
the MIMO channel can be obtained by following the same approach as in 
Section 4.2.1, but we will only provide the main results. The transmitter and 
receiver are time-synchronized, meaning that the receiver samples the received 
signal 7 = d/c seconds after the transmission, where d is a reference distance 
between the transmitter and receiver. The channel response hm,& between 
transmit antenna k and receive antenna m is then obtained (similar to (4.10)) 


as 
in, (4m,k-4) 
hm,k =y Bmp x ’ (4.85) 


where the phase-shift is 2r ame) and the channel gain is 
à 1 
Pm,k = (4.86) 
(47)? dink 


By gathering all the channel responses in an M x K channel matrix, we obtain 


wy (d1,1—4) ao, (41, K-9 
hia ama hi K V Biase?" à ES bı,Ke Jen A 
H=/: > : |= : | : 
ty (4,1 —4) _ a9, (4m, K —4) 
hua ates hm, K VBmie P? a VBM, Ke j2r — 
(4.87) 


This channel matrix applies to any MIMO LOS setup, regardless of the 
antenna array geometries or distances. The channel capacity can be computed 
using Theorem 3.1. In the following sections, we will analyze three specific 
cases to shed light on the interplay between array deployment and capacity. 


4.4.1 MIMO Channel Capacity with ULAs and Planar Wavefronts 


We assume that the transmitter and receiver are equipped with ULAs with 
the same antenna spacing A to gain further insights into the channel matrix 
properties. Moreover, when synchronizing the transmitter and receiver, we use 
the distance d = dı, ı between the first antennas in each array as the reference 
distance. The transmitter and receiver are assumed to be located in the same 
two-dimensional plane (e.g., at the same height above the ground).? We will 
use the same approximations as in the SIMO and MISO cases: Frequency 


9This is a limiting assumption in the MIMO case, which for instance does not cover the case 
when one ULA is deployed horizontally and the other ULA is deployed vertically. The general 
case requires other angles to be defined and adjustments to be made in the channel model, but 
the main conclusions drawn in this section will not change. 
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(k — 1)Asin (p+) 


(m — 1)Asin (pr) 


Figure 4.23: Illustration of a MIMO communication setup where the transmitter is equipped 
with a ULA with k antennas and the receiver is equipped with a ULA with m antennas. The 
antenna spacing is A in each array and the distance between transmit antenna k and receiver 
antenna m is denoted by dm,x. The figure shows a far-field scenario where the angle-of-departure 
is yt for all the transmit antennas, while the angle-of-arrival is pr for all the receive antennas. 


flatness and that each antenna is in the far-field of the other array. The latter 
means that d > 2M?A?/)\ and d > 2K?A?/X according to (4.16). Under 
these approximations, valid in many practical scenarios, there is a common 
angle-of-departure y, for all transmit antennas and a common angle-of-arrival 
Pr among all the receive antennas. We illustrate this setup in Figure 4.23. As 
shown in the figure, the distance dm, can be (approximately) computed as 


dm,k = d+ (k — 1)Asin(yt) + (m — 1)Asin(y,), (4.88) 


which is the reference distance d plus two additional terms describing the 
phase differences among the transmit and receive antennas, respectively. 
These terms are computed trigonometrically, as shown in the figure. The term 
(k — 1)Asin(y,) represents the extra propagation distance at the transmitter 
side, while (m — 1)Asin(y,) represent the extra propagation distance at the 
receiver side. Their values can be either positive or negative, depending on the 
angles, and give rise to different phase-shifts between every pair of antennas. 


The far-field assumption also implies that there is a common channel gain 


B= = (4.89) 


between any pair of transmit and receive antennas because d is much larger 
than the latter two terms in (4.88). Hence, Bm, ~ 8 for all m and k. Under 
these far-field conditions, the M x K channel matrix in (4.87) can be simplified 
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as 
hia hı, K 
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Interestingly, (4.90) shows that the matrix H can be written as an outer 
product of two vectors when considering free-space LOS channels under the 
far-field assumption. The two vectors are the channel vectors that one would 
get with a SIMO channel ( = 1) and a MISO channel (M = 1), except that 
the channel gain 8 only appears once in the expression. The channel matrix in 
(4.90) is derived for arbitrary antenna spacings, but it is common to consider 
A = X/2. In that special case, we can utilize the array response vector defined 
in (4.49) to write (4.90) as 


H = Vam (pakl). (4.91) 


We will now compute the capacity of MIMO channels that can be described 
using the channel matrix in (4.90). We recall that the MIMO channel capacity 
in Theorem 3.1 depends on the non-zero singular values of H. Since the 
channel matrix in (4.90) is the outer product of two vectors, it is a matrix 
with rank one. We can then write its SVD as 


H= sıuıVvŤ, (4.92) 
where 
sı = /BMK (4.93) 


is the only non-zero singular value and the unit-length left and right singular 
vectors are given by the following normalized array response vectors: 


1 
E E . (4.94) 
ui = FS] aM (Pr) = SS : , . 
M M jn (M—1)Asin(yr) 
e A 
1 
e e (4.95) 
Vi = —=ak Yt = —— E . 
V K K +j27 (K—1)Asin(yt) 
e A 
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If we substitute sı and r = 1 into (3.75), the MIMO channel capacity becomes 


opt 2 
q s q6MK 
C = logs ( + a) = logy (1 + ar ) , (4.96) 
where we utilized that Ge = q when there is only one non-zero singular value. 


Example 4.10. An early 5G demonstration reached a data rate of 4.3 Gbit/s 
over a point-to-point LOS channel using B = 800 MHz of mmWave spectrum. 
This example will consider how this value might have been achieved. Suppose 
the wavelength is A = 10mm, the transmit power is P = 10 W, and the noise 
power spectral density is No = 107!” W/Hz. 


(a) If M = K = 1 isotropic antennas were used, how large was the propaga- 
tion distance? 


(b) If M = 64 and K = 8 isotropic antennas were used, how large was the 
propagation distance? 


The capacity of the system is Blog,(1 + SNR) = 4.3 - 10° bit/s, which 
requires an SNR value of 24:3'10°/(8: 10°) _1 ~ 40.5 for a bandwidth of 8-108 Hz. 


(a) The SNR in the SISO case is SNR = Paw If we equate it to 
40.5 and solve for the distance d, we obtain 


d ee nel 10 ue 4.4 
(4n)2BNoSNR (4m)? - 8- 108 - 10-17 40.5 
(4.97) 


(b) The SNR over an 64 x 8 LOS MIMO channel can be extracted from 
(4.96) as SNR = Patan: ME, where MK = 512. The SNR should 
still be 40.5, but since the numerator of the SNR has increased by a 
factor of 512, the squared propagation distance d? can increase by the 


same factor. Hence, the distance is increased to d = 4.4\/512 = 100m. 


When the point-to-point MIMO channel capacity was discussed in Sec- 
tion 3.4, a major distinguishing factor from the SIMO and MISO capacities 
was the existence of a multiplexing gain; that is, the ability to transmit r > 1 
parallel data streams, so that channel capacity grows proportional to r (par- 
ticularly at high SNR). Since the multiplexing gain is multiplied in front of 
the logarithm in the capacity expression, it can improve the capacity much 
more than the beamforming gain, which appears inside the logarithm. Unfor- 
tunately, the spatial multiplexing gain cannot be harnessed in the considered 
LOS setup since r = 1. Only a beamforming gain of MK appears in (4.96), in 
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the sense that the SNR is MK times larger than for the corresponding SISO 
system. The main reason is the far-field situation: the angle-of-departure py 
is (approximately) the same from all the transmit antennas towards all the 
receive antennas. Hence, when the transmitter forms a beam towards the 
center of the receiver array, all the receive antennas are at the center of the 
main beam. Recall from (3.62) that the capacity is achieved using the left 
and right singular vectors to turn the channel into r parallel channels. In 
this case, we only get one such channel, achieved by transmitting the signal 
using the precoding vector vı and processing the received signal using the 
combining vector u. As illustrated in Figure 4.24, this is the same thing 
as performing MRT at the transmitter based on the MISO channel vector 
vi, followed by MRC at the receiver based on the SIMO channel vector u4. 
Hence, the transmitter and receiver can compute their precoding/combining 
independently without knowing how many antennas the other device has. To 
get arank r > 1, the transmitter must be able to transmit multiple beams 
that are distinguishable at the receiver. This can happen when there are 
scattering objects that the signal can bounce off, as previously illustrated in 
Figure 3.16, but not in the considered setup. 

Despite the lack of a multiplexing gain, the beamforming gain is larger 
in MIMO channels compared to SIMO and MISO channels having the same 
total number of antennas. The following example demonstrates the benefit of 
having multiple antennas on both sides of a communication system. 


Example 4.11. If the total number of transmit and receiver antennas must 
satisfy M + K = c, for some integer c, how should we distribute them between 
the transmitter and receiver to maximize capacity? 

The SNR is proportional to M K in the MIMO capacity expression in (4.96). 
Since we have the condition M + K = c, we can rewrite this beamforming 
gain as MK = M(c— M). The first-order derivative with respect to M is 
c— 2M and by equating it to zero, we find that the beamforming gain is 
maximized if M = c/2. Hence, we maximize the capacity by dividing the 
antennas equally between the transmitter and receiver. 


Suppose we have M + K = 10 antennas that can be deployed on the 
transmitter or the receiver. If we create a SIMO system with M = 9 and 
K = 1, the beamforming gain is MK = 9. However, if we create a MIMO 
system with M = 5 and K = 5, the beamforming gain is MK = 25. Figure 4.25 
shows how the corresponding channel capacities depend on the SNR. Since 
the SNR of a particular data signal depends on both the number of antennas 
and whether they are used for beamforming or multiplexing, we define the 
reference SNR as 


SNR = —. (4.98) 


This is the SNR that a SISO system achieves under the same propagation 
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Channel matrix H 


Figure 4.24: An LOS channel between two ULAs only features one propagation path between 
the transmitter and the receiver. The multiplexing gain is r = 1 and the SVD of the channel 
matrix can be expressed as H = sjuyv{. 
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Figure 4.25: The capacity in the MIMO, SIMO/MISO, and SISO cases over far-field LOS 
channels. The MIMO capacity is log2(1 + 25 SNR) and the SIMO/MISO capacity is loga (1 + 
9 SNR), but the total number of antennas is 10 in both scenarios. The SISO capacity loga (1+SNR)) 
is also shown and its SNR is used as the reference SNR. 


conditions and is used on the horizontal axis in Figure 4.25. All the curves have 
the same slope since the multiplexing gain is r = 1 in all the considered cases, 
but the beamforming gain shifts the curves toward the left so that a higher 
capacity is achieved in the MIMO setup for any given SNR. In conclusion, it 
is beneficial to deploy a MIMO system even in a far-field LOS scenario with 
ULAs, although it is disappointing that there is no multiplexing gain. 
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Figure 4.26: Illustration of a MIMO communication setup where the transmitter has a ULA 
with antenna spacing A. The receiver is equipped with a distributed array deployed along the 
arc of a circle with the radius d. The antennas are uniformly spaced over a circle sector with 
the central angle v. 


4.4.2 MIMO Channel Capacity with Distributed Antennas 


The rank deficiency of the MIMO channel matrix that we observed in the last 
section is inherent in the point-to-point terminology. If we transmit from an 
array at one location to an array at another location, and the propagation 
distance is large, then the receiver will be at the center of the main beam 
and can only identify the angular direction of the incoming plane wave; it 
cannot distinguish between the individual transmit antennas. Similarly, if 
one watches a brick wall from a distance, one can identify the location of the 
building but not distinguish individual bricks. 

In this section, we will demonstrate that it is the plane-wave/far-field 
assumption implies the rank-one channel matrix. If we spread out the antennas 
in one of the arrays to the point where the spherical wavefronts become 
noticeable, the rank of the channel matrix will increase. A potential such 
setup is illustrated in Figure 4.26, where the transmitter is equipped with a 
ULA while the receiver is equipped with a distributed array of antennas. For 
simplicity, we assume all the receive antennas are at the same distance d from 
the transmitter’s center; thus, the antennas are deployed on the arc of a circle 
with radius d. The figure illustrates that the angular difference between the 
outermost receive antennas is called V and the antennas are uniformly spaced 
on the arc. We can obtain the exact channel matrix based on these geometrical 
assumptions using (4.87), without making a plane-wave approximation, and 
then compute the channel capacity using Theorem 3.1. 

Figure 4.27 exemplifies the capacity in a setup with M = K = 4 antennas 
and d = 2000A (e.g., 200m if A = 0.1m). The transmitter has a ULA with 
A = X/2 spacing. The receiver either has an identical compact ULA or a 
distributed array of the kind illustrated in Figure 4.26 with either 0 = 20° or 
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Figure 4.27: Capacity of LOS MIMO channels with M = K = 4 where the transmitter is 
equipped with a ULA, while the receiver is either equipped with a half-wavelength-spaced 


ULA or a distributed array of the kind illustrated in Figure 4.26. The ideal MIMO capacity 
4log3(1 + SNR) is shown as a reference. 


V = 75°. The figure shows how the channel capacity depends on the reference 
SNR from (4.98), measured as in a SISO system. Recall that the multiplexing 
gain determines the high-SNR slope of a capacity curve. We notice that the 
slope differs between the curves; thus, the multiplexing gains differ. While a 
receiver equipped with a compact ULA only achieves a multiplexing gain of 
r = 1, a receiver with a distributed array can achieve a larger multiplexing 
gain and, thereby, a steeper slope. The benefit of distributing the antennas 
comes gradually as v increases. The full multiplexing gain r = 4 is achieved 
for V = 75°, but not for V = 20°.'° The reference curve 4log,(1 + SNR) is 
included to represent the ideal case when all the singular values of H are 
equal. The setup with J = 75° has the same slope but is slightly shifted to 
the right since the singular values are unequal. 

We can achieve a larger multiplexing gain when having a distributed array 
because a single beam is too narrow to cover the entire receiver array. The 
half-power beamwidth can be computed for the simulation example using 
(4.58) and becomes 2 arcsin(0.886/4) ~ 0.45 ~ 26°. The angle difference 
between the adjacent receive antennas is 25° when J = 75°; hence, if we point 
one beam towards each receive antenna, as illustrated in Figure 4.28, they 
will barely overlap. This explains why we can nearly reach the ideal MIMO 


10Strictly speaking, the rank of H is 4 in all the considered setups because all the singular 
values are non-zero. However, it is not until the water-filling power allocation uses all the parallel 
channels that the effective multiplexing gain becomes 4 and the slope increases to its maximum. 
It is only for J = 50° that all the parallel channels are utilized within the considered SNR range, 
while the other setups have several singular values that are negligibly small. 
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Figure 4.28: Illustration of how a transmitter with four antennas can transmit a superposition 
of four beams, each carrying different data. Each beam has a different direction and focuses on 
a different part of the receiver’s distributed array. This is how the MIMO channel capacity is 
achieved when the receiver is so large that each main beam only reaches one antenna. 


capacity in this setup. All the signals are sent from all the transmit antennas 
to achieve narrow beams and reach all the receive antennas, but with varying 
amplitudes and phases, so we can use the SVD to create four parallel channels 
that are almost equally strong. In practice, this property can be utilized by 
deploying base stations at different locations and serving each user device 
using all of them to create a MIMO channel with a high rank. Such systems 
are called Cell-free MIMO [2] or coordinated multipoint [52]. 

If we shift focus to the low-SNR regime, we can notice from Figure 4.27 
that the ULA outperforms the two distributed arrays in this case (but the 
margin is smaller for ? = 20°). The reason is that the water-filling power 
allocation will only utilize the subchannel with the largest singular value in 
this case; thus, having a rank-one channel is preferable at low SNR because 
then the maximum beamforming gain of MK can be achieved. 


Example 4.12. Consider a MIMO system with M = K = 2 antennas. For 
which SNR values will we achieve a higher capacity with a full-rank channel 
matrix with two identical singular values than a rank-one channel matrix? 

A rank-one channel has the capacity log,(1+ M KSNR) = log,(1+4SNR) 
from (4.96), where SNR denotes the SNR of a corresponding SISO channel. 
If the singular values are equal, the capacity becomes 2 log,(1 + SNR) where 
there is a multiplexing gain, but the power allocation cancels the beamforming 
gain. The full-rank channel matrix achieves a higher capacity if 


2loga(1 + SNR) > log.(1+4SNR) = (1+SNR)?>1+4SNR 
= SIE Se (4.99) 
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A similar computation for the case M = K = 4 that was considered 
in Figure 4.27 would result in (1 +SNR)* > 1 + 16SNR, which implies 
SNR & 1.06 = 0.25dB (which is an approximate number). We can observe 
that this is the intersection point between the ULA curve and reference curve 
in Figure 4.27. The intersection point will gradually reduce if we continue 
increasing the number of antennas. 


4.4.3 MIMO Channel Capacity in the Radiative Near-Field 


The last section demonstrated that the rank of an LOS MIMO channel matrix 
becomes larger than 1 when the receiving array has a size larger than the 
transmitted signal’s beamwidth. This might happen even if the antennas 
are gathered in a compact aperture but not under the far-field conditions 
considered earlier. In this section, we instead consider the radiative near-field, 
where the spherical curvature of the signals helps to increase the channel rank. 

The half-power beamwidth of a ULA transmitting in the broadside direction 
was shown in (4.58) to be 2arcsin (0.886/M) when the antenna spacing is 
A = 4/2. The beamwidth with an arbitrary antenna spacing A can be 
expressed as 


0.886 0.443, 0.886 
=2 i =2 i xs dians, (4.100 
Y arcsin E z ) arcsin ( D, ) D, radians, ( ) 


where D, = MA denotes the aperture length of the transmitting ULA. The 
approximation in (4.100) is based on that arcsin(x) ~ x for x € [0, 0.4] and is 
therefore tight when the aperture length is at least a few wavelengths. The 
expression in (4.100) shows that the beamwidth is narrow when either the 
wavelength A is small or the aperture length D, is large. Since the beam 
has a constant angular width, the physical width measured in meters grows 
linearly with the propagation distance. At a distance d from the transmitter, 
the physical beamwidth becomes 


Y Y _ 0.886d 


2dtan (5) x 2d = dy D, meters, (4.101) 


where the approximations once again follow from the assumption that the 
angles are small. Figure 4.29 illustrates this relationship between the aperture 
length, half-power angular beamwidth, and physical beamwidth at a distance d. 
If the receiver has a smaller aperture length than the physical beamwidth when 
the beam is focused on the outermost antennas, as shown in the figure, we can 
expect the channel matrix to have rank 1. The rank is higher when the receiver’s 
aperture length D, is larger than half the physical beamwidth vd because then 
we can focus different beams on the two outermost antennas (top and bottom) 
and have limited overlap. In practice, the aperture lengths of the transmitter 
and receiver are limited by the physical sizes of the respective devices. However, 
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Figure 4.29: The half-power beamwidth 7 ~% 0.886A/D; measures the angular width of the 
transmitted beam containing most of the power. The physical beamwidth (in meters) experienced 
at a distance d from the transmitter is approximately wd. If the beam is focused on the outermost 
antenna, half the physical beamwidth should be compared with the receiver’s aperture length 
D,. If it is smaller than wd/2, as exemplified here, the channel matrix will have rank 1. 


we can achieve a high-rank channel by reducing the wavelength (i.e., increasing 
the carrier frequency), which is a distinct benefit of using the high-band 
spectrum in LOS scenarios. 


Example 4.13. Suppose the transmitter and receiver are equipped with ULAs 
with the aperture length D, = D, = 1m. For which propagation distances d 
is half the physical beamwidth in (4.101) smaller than the receiver’s aperture? 
Consider the wavelengths À = 1dm (3 GHz) and \ = 1 cm (30 GHz). 

Half the physical beamwidth in (4.101) is smaller than D, if 


10.886Ad < 2D; D, 


Sae) ; 
2 D ~ = 0.886 


(4.102) 


The upper bound is similar to the Fraunhofer distance 2D?/A if D = D, = D;. 
Hence, the far-field plane wave approximation is not applicable when half the 
transmitter’s beamwidth is smaller than the receiver’s aperture length. We 
obtain the range d < 22.6m if A = 1 dm and d < 226m if A = 1cm. Hence, 
for practically sized arrays, we can utilize spherical wavefronts to achieve a 
high-rank LOS channel if the distance and/or wavelength is small. 


Figure 4.30 shows the capacity achieved when having ULAs with M = 
K = 100 antennas at the transmitter and receiver. The wavelength is A = 1 cm 
(30 GHz) and the antenna spacing is A = A, which results in an aperture length 
of D; = D, = 1m as in the last example. The arrays are deployed parallel to 
each other at a distance d that is varied on the horizontal axis. The SNR is 
defined as in (4.98) and fixed at 20dB.'! The capacity is substantially higher 
for all the considered distances than the reference curve loga(1 + MK SNR), 
which is achieved with a rank-1 channel matrix. However, the capacity curve 
converges to this value as the distance grows large because then the far-field 


11]f the transmit power is fixed, the SNR is also distance-dependent. However, this example 
focuses on showing how the singular values of the channel matrix depend on the propagation 
distance in the radiative near-field, so we keep the SNR fixed to highlight this effect. 
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Figure 4.30: The capacity of LOS MIMO channels with M = K = 100 where the transmitter 
and receiver are equipped with ULAs. The channel matrix gets more non-zero singular values 
with similar strengths at shorter propagation distances, which results in a higher capacity thanks 
to the larger effective multiplexing gain. The rank-1 MIMO capacity loga(1 + MK SNR) is 
shown as a reference curve and SNR = 20 dB. 


approximation becomes valid. We get a higher capacity in the considered 
range because H has multiple non-zero singular values, and the number grows 
rapidly at short distances. When d = acest x~ 226m, which was derived in 
(4.102) by comparing the transmitter’s beamwidth with the receiver’s aperture, 
the capacity is roughly twice as large as in the far-field. The capacity changes 
slowly with the distance, so we could alternatively use 2P.P = = 200m as the 
approximate maximum distance for near-field spatial multiplexing because it 
looks similar to the Fraunhofer distance. Both points are indicated in the figure. 
In conclusion, for fixed aperture sizes at the transmitter and receiver, the 
LOS channel matrix gets more non-zero singular values when the propagation 
distance d shrinks because the physical beamwidth becomes smaller than the 
receiver array. 

When communicating between two locations separated by a distance of d, 
we can optimize the antenna deployment to achieve the maximum channel 
rank and equal singular values, which gives the ideal MIMO capacity at high 
SNRs. For notational convenience and inspired by [53], we will consider two 
ULAs with M antennas and matching antenna separation A. The ULAs are 
parallel and located in each other’s broadside directions. The antennas with 
the same index in the two ULAs are separated by the distance d, as illustrated 
in Figure 4.31. It then follows from the Pythagorean theorem that the distance 
between antenna k at the transmitter and antenna m at the receiver is 


= 2A2 = 242 
dma = YET m BEA = yf ETARA o a (1 4 TRAN 


a 2d? 
(4.103) 
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Figure 4.31: Illustration of a MIMO communication setup with two parallel ULAs with the 
same number of antennas and antenna spacing A. The distance between transmit antenna k 


and receive antenna m is ,/d? + (m — k)? A? and can be Fresnel approximated as in (4.103). 


where we used the first-order Taylor approximation V1 + z? ~ 1+ = that 
is tight for 0 < x < 0.25. Therefore, the approximate expression is tight 
when the distance d exceeds the aperture length. Despite the approximation 
in (4.103), the derivations that will follow in this section are more accurate 
than the far-field approximation in (4.88), which corresponds to making the 
approximation yd? + (m — k)?A2 ~ d. More precisely, it is the difference 
between using a first-order and zeroth-order Taylor approximation of the 
distance between antennas. The simplification in (4.103) is called the Fresnel 
approximation [54] and models the waves as parabolic. We say that we operate 
in the radiative near-field region (also known as the Fresnel zone) when the 
Fresnel approximation is tight, but the far-field approximation is not. 

If we return to the general MIMO channel matrix expression in (4.87) and 
make use of the Fresnel approximation in (4.103) and that K = M, we obtain 


. 1242 ._ (M—1)2A2 
1 eit xa L.. eT ad 
. 12A2 ._ (M—2)2A2 
—jr A =|. v 
e Xd 1 saa. € Xd 
H = y8 l l l , (4.104) 
._(M-—1)2A2 ._ (M—2)2A2 
EIT xd Jv xd 1 


where 8 = Hoy is an accurate approximation of the channel gain in these 
cases. We recall from Example 3.7 that the singular values of H are also the 
square roots of the eigenvalues of H"H. If the columns of H are mutually 
orthogonal, then H"H = MI, and all the singular values will be “WMP. 
Orthogonality between the columns can be achieved by fine-tuning the antenna 
spacing A. The magnitude of the inner product between the kth and lth 
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column (for | Æ k) can be computed as 


_ (k—-1)2A2 7 F . (l—1)2A2 
e J* xd e J" Xa 
__ (k=2)7 A? ._ (1—2)2A2 ; 
eJI" rd e J7 Xa M ._ (k-m)? A? ._ (l=m)? A? 
B = §. J el Xd e IT xd 
m=1 
(k—M)2 A? (lL—M)2A2 
J xd J Xd 
M | am(t—K)A2 |? M(I—k)A? 
jr 2OmaD ae) a? 1— ea sin a v 
= rr. a = 
B p> S . _2(l—k)A2 p p I—k)A2 
jn 2A. (l=k) 
m=1 l-e Xd sin (7 
(4.105) 


x 2 i 
The second equality follows from multiplying with e`" Sa (k? —1+2(1—k)) inside 


the magnitude to remove terms that are independent of m. This can be done 
since this term has unit magnitude. We used the formula for geometric series 
similarly as in (4.52) to obtain the final expression in (4.105). 

If we select the antenna spacing so that ue = 1, then the numerator in 
(4.105) is zero while the denominator is non-zero, because |l — k| < M — 1 for 
l,k € {1,..., M}. Hence, if the antenna spacing in the two antenna arrays is 


Ad 
A= \,/— 4.1 
T (4.106) 


the columns of the channel matrix in (4.104) are mutually orthogonal. The 
aperture lengths of the ULAs are D = MA = vV MAd, which shrinks when the 
carrier frequency fe = c/AÀ is increased. The length grows with the number of 
antennas but slower than linear since the antenna spacing reduces with M. 

Since there are M singular values that equal VMB, the water-filling 
power allocation will assign the power equally between them. Hence, for 
the considered setup with two parallel M-antenna ULAs and the optimized 
antenna spacing in (4.106), the MIMO capacity in (3.75) becomes 

M6 qp 


q : 
C = M log, (1 + a) = M log, (1 + +) bit/symbol. (4.107) 


This is the ideal capacity from a multiplexing perspective. 


Example 4.14. Suppose we want to design an LOS MIMO system with 
M = K = 32 antennas where the propagation distance is d = 50m. Which 
antenna spacing is needed to achieve M identical singular values if A = 1cm 
(30 GHz)? What is the resulting aperture length? 


The antenna spacing in (4.106) becomes A = \/ 42 = 0.125m, which is 


12.5 wavelengths. This unusually large separation will enable the receiver to 
detect the spherical wavefronts. The aperture length becomes MA = 4m. 
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Figure 4.32: When the antenna spacing of the ULAs in Figure 4.31 is optimized according 
to (4.106), then the channel matrix has a full rank, all the singular values are equal, and the 
capacity is achieved by transmitting independent signals from each antenna (as illustrated by 
the coloring). The receiver will use the phase-shift variations created by the spherical wavefronts 
to separate the transmitted signals. 


The antenna spacing was fine-tuned in (4.106) to achieve orthogonal 
columns in H. The singular value decomposition is 


1 e J" Mm ne @ IT M ; 
a : : 0 0 0 | In, 
jg ta? l “jn 2 0 /BM| =V" 
e M E e M ` 
_—_—_—_— aal => 
=U 
(4.108) 


where the left singular vectors in U are the normalized columns of H in 
(4.104) when using the antenna spacing in (4.106). The matrix V with the 
right singular vectors is an identity matrix and is used for precoding. The 
transmitted signal becomes x = Vx = xX, which implies that each of the 
independent signals in X is transmitted from only one of the antennas. The 
receiver then utilizes the fact that the spherical wavefronts create varying 
phase-shifts over the receive antennas to separate the signals while achieving 
identical signal strengths for all of them. This mode of operation is illustrated 
in Figure 4.32. This structure is unique to the optimized antenna spacing, 
which is fine-tuned so that when the receiver focuses a beam on a transmit 
antenna, the other antennas are exactly at the nulls of the beam pattern. If we 
use slightly different spacings, the channel matrix will have slight variations 
in the singular values, and the precoding will not reduce to transmitting 
independent signals from the antennas. 

We analyzed parallel ULAs in this section, in which case all the phase-shifts 
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in H are caused by the spherical wavefronts. When the arrays are rotated 
differently, there will be further phase-shifts since plane waves will also give 
rise to that, as shown in (4.90) for far-field communication. The unique 
feature of communicating in the radiative near-field is that the MIMO channel 
matrix gets a higher rank than 1 because the phase-shifts caused by the 
spherical wavefronts vary non-linearly with the antenna index. The traditional 
Fraunhofer distance 2D?/. is unable to determine the upper limit of the 
near-field region of MIMO channels because it was derived for a MISO/SIMO 
channel with an aperture length of D on one side and a single isotropic antenna 
on the other side. In MIMO scenarios, we must take both the transmitter’s 
aperture length D, into account since it determines the narrowness of the 
beam and consider the receiver’s aperture length D, because it determines the 
ability to observe spherical wavefronts. One way to characterize the radiative 
near-field is that the propagation distance d must satisfy 
2D,D, 
d < À ? 
where the upper bound is called the near-field multiplexing distance. We stress 
that this is a rule-of-thumb for when we can at least double the capacity 
compared to the far-field, but the capacity value varies slowly around this 
value, so alternative upper limits can be defined. 


(4.109) 


4.5 Three-Dimensional Far-Field Channel Modeling 


The MIMO channel matrix expression in (4.87) depends on the exact propa- 
gation distances between every pair of transmit and receive antennas; thus, 
it can be utilized to model any free-space LOS channel. In contrast, all the 
simplified expressions for ULAs that have been derived so far are limited in 
their generality by the choice of array geometry and the assumption that all 
antennas are located in the same two-dimensional plane. When analyzing 
SIMO and MISO channels with ULAs, we can always define the coordinate 
system such that all antennas are located in the same plane. By contrast, 
this is not always possible in the MIMO case; for example, one array might 
be deployed horizontally and the other array vertically. In this section, we 
will consider the general case where the transmitter and receiver can have 
arbitrary array geometries. The only limiting assumption is that the receiver 
is in the far-field of the transmitter. 

We begin by considering the SIMO setup illustrated in Figure 4.33, where 
a single-antenna transmitter sends a signal toward two receive antennas. The 
receive antennas are in the far-field of the transmitter; thus, the impinging 
wavefront can be approximated as planar. The location of each antenna 
is represented by a three-dimensional vector, representing a point in the 
three-dimensional world. Suppose we define the coordinate system such that 
one receive antenna is located in the origin, denoted by 0 = [0,0,0]". The 
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Figure 4.33: Illustration of a setup with two receive antennas, one located at the origin 0 and 
one at location u, receiving a planar wavefront emitted by a transmitter at location d. The 
unit-length vector p points out the direction leading towards the transmitter. The difference in 
propagation distance between the two receive antennas is uT p. 


transmitter location is d € R3, while the location of the other receive antenna 
is u € R®. The impinging wave will generally reach the receive antennas at 
slightly different times, determined by the difference in propagation distances, 
leading to phase differences when the signals are sampled simultaneously. To 
determine this phase difference, we define the unit length vector 


d 


PSs 
llall 


(4.110) 
that points from the origin towards the transmitter. Since the planar wavefront 
propagates perpendicular to p, we can determine the path difference between 
the two receive antennas by projecting the location u of the second receive 
antenna onto p. The orthogonal projection is given by u™p € R and represents 
how much shorter the distance is to the second receive antenna, compared to 
the distance to the antenna in the origin. A negative value implies a longer 
distance to the second antenna. 

Suppose the impinging signal has wavelength A. In that case, u™p/À 
represents how many wavelengths shorter the propagation distance is to the 
second antenna, while the corresponding phase-shift is 


a iig, (4.111) 
À 

The channel response will then be VBa rue , where 8 € [0,1] is the channel 
gain and the minus sign disappeared since phase-shifts appear with a minus 
in channel models. Recall that the unit-length vector p specifies the direction- 
of-arrival of the planar wavefront using Cartesian coordinates; however, it can 
be more instructive to describe it using angles. To this end, we will make use 
of the spherical coordinate system, defined in Figure 1.9, and parametrize 
the directional vector p in terms of the azimuth angle y € [-r, T) and 
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Figure 4.34: Illustration of a setup where a planar wave impinges on an array with M receive 
antennas from the azimuth angle » and elevation angle 0. 


the elevation angle 0 € [—1/2, 7/2]. The one-to-one mapping between these 
coordinate systems was stated in (1.22) and implies that 


cos(y) cos(A) 

sin(y) cos(@) (4.112) 
sin(0) 

By substituting (4.112) into (4.111), we can represent the phase-shift using 

the azimuth and elevation angles. 


p= 


We will now consider a SIMO channel with M receive antennas where the 
location of antenna m is denoted by um € R3, as illustrated in Figure 4.34. A 
planar wave is impinging from the angular direction (y,@), measured from 
the origin wherever it might be. None of the antennas need to be located at 
the origin, but we will still utilize it as the reference point when computing 

the phase-shifts. More precisely, the sampling delay is selected to obtain a 
zero-valued phase-shift at the origin. The phase-shift at the mth antenna will 
then be —27u*,p/A, where p is computed using (4.112). We can define the 
array response vector 


(4.113) 
eE ue 

as the normalized channel vector (i.e., without the channel gain) for the case 
when the impinging signal has the angles-of-arrival (y, 0). If all the antennas 
are isotropic and 8 = \?/(47d)? denotes the channel gain at a propagation 
distance of d, then the SIMO channel vector can be expressed using (4.113) as 


h= /Ba(y, 0). 


(4.114) 
This channel vector can also be utilized for the MISO channel obtained by 
reversing the roles of the transmitter and receiver. 
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Example 4.15. Consider two receive antennas deployed on the y-axis at the 
locations u; = [0,/4,0]7 and ug = [0, —/4, 0]". What is the array response 
vector a(y, 0)? How does a(y,@) depend on the azimuth angle y when the 
elevation angle is 0 = 0 or 0 = 1/2? 

The antenna separation is 1/2. The array response vector for an arbitrary 
angle (y,@) is obtained using (4.113) as 


eiture els sin(y) cos(@) 
OS ae = oe sin(y) wt) : (4.115) 
When 6 = 0, (4.115) simplifies to 
ej? sin(y) 
a(y,0) = Pe (4.116) 


where there is a phase-shift difference of 7sin(y) between the antennas. The 
same phase difference between the adjacent antennas was obtained in (4.23) 
for a ULA with A = à/2 that was also deployed along the y-axis. 

When 0 = 7/2, the transmitter is at a point along the z-axis and is 
unaffected by ọ since p = [0,0, 1]". Hence, the impinging wave is always from 
the broadside direction, and the corresponding array response vector is 


ee H (4.117) 


4.5.1 Array Response Vector with a ULA in Three Dimensions 


We will now particularize the array response vector in (4.113) for a ULA with 
M antennas where the spacing is A. We assume that the ULA is deployed 
along the y-axis, with the first antenna located in the origin and the remaining 
antennas located along the negative side of the axis. This assumption can be 
made without losing generality since we can rotate the coordinate system as 
we like. Under these assumptions, the location of receive antenna m becomes 


0 
Um = |-—(m—1)A}. (4.118) 
0 


This setup is illustrated in Figure 4.35. The inner product between the location 
vector in (4.118) and the direction-of-arrival vector in (4.112) becomes 


0 cos(y) cos(A) 
urp = Lm SA sin(y) mi) = —(m — 1)Asin(y) cos(0). (4.119) 
0 sin(@) 
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Figure 4.35: Illustration of a setup where a planar wave impinges on a ULA with M receive 
antennas from the azimuth angle » and elevation angle 0. 


If we substitute (4.119) into (4.113), we obtain the array response vector for 
a ULA as 


1 
15 Asin(y) cos(0) 
e i2n sin r cos 
am(y, 0) = 


t5 (M—1)A sin(y) cos(@) 
j2 ZE yp) cos 


where the subscript denotes the number of antennas. This is a generalization 
of the previous array response vector in (4.37) since the impinging wave can 
arrive from any angular direction (not limited to the horizontal xy-plane). 


Example 4.16. Consider a ULA with M antennas deployed along the z- 


axis. The antenna spacing is A and the mth element is located at Um = 
(0,0, —(m — 1)A]*. What is the array response vector? 


0 T Teos(v) cos() 
0 sin(y) cos(@) (4.121) 
—(m—1)A sin(0) 

The array response vector of this vertical ULA is obtained by substituting 
(4.121) into (4.113), which yields 


a(y, 0) 5 | 


«5 A sin(0) 
l,e JT S 


This ULA is deployed vertically. The inner product between the location 
vector Um and the direction-of-arrival vector in (4.112) becomes 


T a 
UnP = 


—(m — 1)Asin(@). 


e j2m 


ia sin T 

Beal (4.122) 
This expression differs from the one in (4.120) for a horizontal ULA, due 
to the different deployment directions compared to the assumed spherical 


coordinate system. However, (4.122) matches with am (0,0) from (4.120) when 
the elevation angle is zero while the azimuth angle is replaced by 6. 
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There are many ways to express the array response vector of a ULA, 
depending on how the coordinate system is rotated compared to it. Two exam- 
ples are given in (4.120) and (4.122). While the transmitter’s location relative 
to the receiver matters when determining the beamwidth, the communication 
performance is the same irrespective of how the coordinate system is rotated. 
A three-dimensional array response model is necessary when there are multiple 
impinging wavefronts via different propagation paths, so we cannot rotate the 
coordinate system to place everything in a two-dimensional plane. 


4.5.2 Array Response Vector with a Uniform Planar Array 


Many practical antenna arrays are planar, which means that the antennas are 
deployed in two dimensions: one horizontally and one vertically. There are 
three main benefits of this. Firstly, if an array is designed with a maximally 
allowed aperture length, we can fit more antennas by distributing them over 
two dimensions. This is because the length is then measured diagonally, as 
previously illustrated in Figure 4.1(c). This allows for a larger beamforming 
gain in a size-constrained deployment. Secondly, a horizontal ULA can have a 
small beamwidth in the horizontal plane while it spreads the power equally 
over all elevation angles (see Figure 4.18). Since the prospective users are 
typically below the base station (i.e., the elevation angles of interest are 
6 € [—7/2,0]), half of the power is lost by radiating it into the sky. A planar 
array can have a small beamwidth also in the vertical plane. Thirdly, a planar 
array is capable of 3D beamforming, where it points different beams towards 
objects/users located in similar azimuth angles but different elevation angles. 

We will analyze the canonical form of a planar array: the uniform planar 
array (UPA) where the antennas are deployed on an evenly spaced grid in 
two dimensions, as illustrated in Figure 4.36. Each row has My antennas with 
the spacing A between the adjacent antennas. Similarly, each column has My 
antennas with the spacing A between adjacent antennas.!? Since there are 
My rows and My columns, the total number of antennas is M = MyMy. 
The horizontal spacing between the centers of the two outermost antennas in 
each row is (My — 1)A. Similarly, the vertical spacing between the centers of 
the two outermost antennas in each column is (My — 1)A. Since each antenna 
also has a physical size, we will denote the horizontal length as MyA and the 
vertical length as MyA (in line with what we did when analyzing ULAs). 

The aperture length D is measured along the diagonal of the UPA. It 
follows from the Pythagorean theorem that 


D = y/(MaA)? + (MyA)? = y M + MRA. (4.123) 


12We have selected equal horizontal and vertical antenna spacings for notational convenience, 
but this assumption can be generalized. It makes sense for devices to have the same spacings in 
both dimensions because they can be rotated freely by the user. However, fixed base stations 
commonly use a larger vertical than horizontal spacing to achieve a narrower vertical beamwidth. 
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Planar wavefronts 
arriving from (y, @) 


Figure 4.36: Illustration of a setup where a planar wave impinges on a UPA from the azimuth 
angle y and elevation angle 0. The UPA has My antennas per row and My antennas per column 
on a grid in the yz-plane. The horizontal and vertical spacings are A. 


Example 4.17. Consider an array with M = 100 antennas with A = \/2 
spacing that is designed for À = 0.1m (i.e., 3 GHz). Compare the aperture 
lengths obtained if the array is a ULA or a UPA with 10 x 10 antennas. 

With the ULA configuration, the aperture length is D = MA = 100-0.05 = 
5m. With the square-shaped UPA configuration, the aperture length in (4.123) 
becomes D = 102 + 102-0.05 = V2-0.5 = 0.7m. The horizontal and vertical 
lengths of the UPA are 10-0.05 = 0.5m, which is ten times smaller than with 
the ULA. In conclusion, the UPA configuration enables the given number 
of antennas to be deployed in a physically smaller form factor. This feature 
enables large numbers of antennas in practical deployments. 


We will now particularize the general array response vector expression 
in (4.113) for a UPA with the antenna spacing A. We assume that the UPA 
is deployed along the yz-plane with the first antenna located in the origin 
and the remaining antennas located along the negative side of the y-axis and 
z-axis, as illustrated in Figure 4.36. This assumption can be made without 
loss of generality since we can define/rotate the coordinate system as we like. 
There are My rows, each extending horizontally along the negative y-axis and 
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Figure 4.37: Illustration of a UPA with M = 20 antennas, which are divided into My = 5 
antennas per row and My = 4 antennas per column. The antennas are numbered row-by-row 
from 1 to 20, as shown in the figure. When characterizing the array response vector, it is useful 
to characterize the horizontal index as in (4.124) and the vertical index as in (4.125). 


containing My antennas. Similarly, there are My columns, each extending 
vertically along the negative z-axis and containing My antennas. 

The first antenna is located in the origin. Suppose the antennas are 
then consecutively indexed row-by-row by m € {1,..., M}, where M = 
My My is the total number of antennas. The horizontal index of the first 
My antennas will be 0,1,..., Mg — 1. These indices are repeated on each 
row; thus, the antennas (n — 1)My + 1,...,nMy also have the horizontal 
indices 0,1,..., My — 1, for n = 2,..., My. Using this row-by-row indexing, 
the horizontal index of antenna m can be computed as 


m— 1 


My 


i(m) = (m— 1) Mu | | € {0,1,..., Mu — 1}, (4.124) 
where |-| rounds the argument to the closest smaller or equal integer. The 
computation in (4.124) gives the remainder when dividing m — 1 by My and 
is known as the modulo operation. 

Next, we will define the vertical index, which will be 0 for the My antennas 
on the first row. The vertical index will then be n — 1 for the antennas 
(n — 1)My + 1,...,nMu, for n = 2,..., My. Hence, the vertical index of 
antenna m can be obtained as 


—1 

j(m) = ea TOANE nS (4.125) 
My 

which returns the integer-valued quotient when dividing m — 1 by My. The 

mapping between antenna numbers and horizontal/vertical indices is illus- 

trated in Figure 4.37 for a UPA with My = 5 and My = 4. 
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Under these indexing assumptions, the location of antenna m becomes 


0 
Un = |—i(m)A]. (4.126) 
—j(m)A 


The inner product between the location vector in (4.126) and the direction-of- 
arrival vector in (4.112) becomes 


0 T Tcos(y) cos() 
u,,P = |—i(m)A sin(y) cos(@) | = —i(m)A sin(y) cos(@) — j(m)A sin(6). 
—j(m)A sin(0) 

(4.127) 

If we substitute (4.127) into (4.113), we obtain the array response vector for 
the UPA as 

1 
ei? (i(2)A sin(y) cos(@)+5(2)A sin(6)) 
AaMy,My(¥, 9) = 


j22 (4(M)A sin(y) cos(0)+j(M)A sin(0)) 


ex 
1 
197 Asin(y) cos(@) 
e7 iam sin 2 cos 
1- 
7 jom ae 
1 
jop Aae) cos(@) 
ion Asin (8) e i 
eam 
j2r (My —1)A sin(y) cos(@) 
€e A 
1 
-jor Anne cos(@) 
og (Mv -DA sin(8) e 7 
-a E E 
i (My —1)A sin(y) cos(@) 
e720 l x 
1- amy (9, 0) 
. Asin(@) 
—j2 
e J2m—y -amy (p, 8) 
= . = am, (0,0) 8 amy(y, 9). 
7 (My —1)A sin(@) 
jar Sy ease 
oe TA '‘aMu (P, 4) 


(4.128) 


The subscript of amyu,my (Y, 0) denotes the number of antennas along the 
horizontal and vertical axes, respectively. The derivation utilizes the fact that 
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the horizontal index changes between each entry while the vertical index 
only changes after My entries. On the last row in (4.128), we first recognize 
the array response vector ajy,(y,@) in (4.120) for a horizontally deployed 
ULA with My antennas with separation A. Next, we identify a Kronecker 
product between this vector and the array response vector ajy,(0,0) of a 
vertically deployed ULA with My antennas with separation A. This setup 
was previously characterized in Example 4.16. 

In summary, the array response vector of a UPA is a concatenation of 
the array response vectors of two ULAs computed through a Kronecker 
product. Each vector entry contains the phase-shift the corresponding antenna 
experiences compared to the first antenna in the UPA, located in the origin. 
The phase-shift depends on the azimuth angle y, elevation angle 0, antenna 
spacing A, and the number of horizontal and vertical antennas. Note that if 
we set My = 1 and M = Mg, the array response vector in (4.128) reduces to 
the previous result in (4.120) for an M-dimensional horizontal ULA. 


Example 4.18. What is the array response vector of a UPA with My = 2, 
My = 3, and the antenna spacing A = /2? 
According to (4.128), we can compute the array response vector as 


a23(Y, 0) = a3(6, 0) Q aa(y, 0), (4.129) 


which is the Kronecker product between the array response vectors of two 
ULAs. We can compute those vectors using (4.120) as 


1 


bee 1 
a3(9, 0) = | g F 2 a2(¥, 0) T | —jm sin(y) cos(0 | : (4.130) 
e—i2m sin(9) an eee) 


We can now use the definition (2.54) of a Kronecker product to compute the 
UPA’s array response vector as 


1 1 
1. enim sin(y) cos(0) eit sin(y) cos(@) 
eres 1 et sin(6) 
= —jr sin(0 

a2,3(Y, 0) =|e! (9) . lei sin(y) co) eit sin(9) 6—jn sin(y) cos(@) 

; ; 1 el2 sin(@) 

—j27sin(@) , 

er ee sin(y) cos) e720 sin(@) eit sin(y) cos(@) 


(4.131) 


4.5.3 Horizontal and Vertical Beamwidths with UPAs 


The beamwidth has been characterized previously in this chapter for a ULA, 
but only considering the horizontal plane where 0 = 0. In this section, we 
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will extend the analysis from Section 4.3.4 to the case of a UPA and derive 
the beamwidths of 3D beamforming. The channel vector for a UPA with an 
arbitrary antenna spacing A and wavelength A is h = ypa my, my (4, 9), where 
B is the channel gain and the array response vector is given by (4.128). 

Suppose we transmit a signal in the direction (Ypbeam, Obeam), Where Ypeam €E 
[—1/2,7/2] is the azimuth angle and @beam E€ [—7/2,7/2] is the elevation 
angle, using MRT with p= aMy, My (Pbeam> Obeam)/||a Mu, My (Pbeam, Obeam ) ||- 
We can then follow the approach in (4.75) to determine the beamforming gain 
that is observed by a receiver located in any another direction y € [—7/2, 7/2], 
6 € [-1/2, 2/2): 


2 
* 
aMy,Mv (Ybeam , Obeam) 


ayy 0 
Mu, My (9 ) |a mu, My (Pheam, Obea) 


T 


(any (9, 0) ® amy (p, 0)) (amy (9beam; 0) & a Muy (Pbeam; beam) )” |? 


1 
~ MyMy 
1 


T * 2 l oai + 2 
= My am, (0, OJañr, (Abeam; 0)| Mu lang, (Y: 0)a Ma (scanty Pbeam)| 


2 š 2 
m, IZAR (sin(p) cos(9) — sin(Pbeam) COS(Pream) ) 
—— maal 


=Amy (®) 
-jA (sin() — sin (Obeam) ) . 
My —— Coen 4.132 
A =2 
x My a ° | B 


=Amy (9) 


where the first equality utilizes the expression in (4.128) and the fact that 
|a Mu, My (Pbeam; beam) ||? = MyMy. The second equality utilizes the Kro- 
necker product property (a @ b)?” (c @ d)* = (a7c* & b7d*) which holds for 
any vectors a,b,c,d with matching dimensions. 

We notice that the beamforming gain expression in (4.132) is decomposed 
as the product of the two terms that we denote as Ajy,(®) and Am, (Q), 
which depend on the angles through the variables 


= sin(y) cos(@) = sin(Ypeam) cos(Abeam); (4.133) 
Q = sin(9) — sin(Obeam): (4.134) 
Each of these functions has the same structure as the beamforming gain 


function considered in Section 4.3.4. More precisely, we can use the summation 
formula for geometric series in (4.51) and Euler’s formula to compute the 
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horizontal function Ajy,,(®) as 


: zA 
1 sin? (Mu 74) 1 sin? (wLy,,®) 


A i) — = 
Mu (®) Mu sin? (=52) My sin? (tA)®) ’ 


(4.135) 


which equals A(®) in (4.79) when M = My. The last equality in (4.135) 
follows from using the notation A, = A/A for the normalized antenna spacing 
and by defining the normalized horizontal length of the UPA as 


MyA 
Ly = == = MyA,. (4.136) 


Whenever the numerator and denominator in (4.135) are zero simultaneously 

(e.g., if ® = 0), it follows from the geometric series formula that the function 

value is My. We will not write this out to keep the expression compact. 
Similarly, the vertical function Aj, (Q) in (4.132) can be expressed as 


(4.137) 


which is equal to A(®) in (4.79) when ® = Q and M = My. The last equality 
in (4.137) follows from defining the normalized vertical length of the UPA as 


MyA 
Y2 = MyA\y. (4.138) 


Lva = 
In summary, the beamforming gain in (4.132) that is obtained in the angular 


direction (p, 0) can be expressed as 


_ 1 sin? (7Ly,,®) sin? (rLy,,Q) 
~ M sin? (tA) sin? (TAQ) ` 


Ama (Am (Q) (4.139) 

The maximum beamforming gain is M and is achieved for Y = Ypeam and 
0 = Obeam (the intended beamforming direction) because in that case, we have 
® = =0.!° We further notice that the maximum gain, M, only depends on 
the number of antennas and not on the antenna spacing, which aligns with 
the analysis of ULAs earlier in this chapter. 

The beamforming gain Ama (®)Am, (Q) is generally smaller than the 
number of antennas M. Depending on the angles (y, 6) and (Ypeam; Obeam); 
the input variables can take values in the ranges ® € [—2, 2] and Q € [—2, 2]. 
For given values of ® and Q, the function value is determined by M, the 
normalized antenna spacing A), the normalized horizontal length Ly), and the 
normalized vertical length Ly,,. It is always the relative distances compared 
to the wavelength that matters in this context. 


13Tt follows from (4.132) that Amr, (0) = My and Am, (0) = My. 
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Example 4.19. What is the area of a UPA? How does the SIMO capacity in 
(4.25) depend on the UPA’s area if A = \/2? 

The horizontal length of the UPA is MyA and the vertical length is MyA; 
thus, the area is Area = MyA- MyA = MA?. By utilizing the fact that 


= 7 È z +, we can express the SIMO capacity in (4.25) as 


PMX? IP 
CO = IB 14 = Bl 1 + —— A . (4.14 
k ( a a ( * BNo nd? rea) oan 


The SNR in this capacity expression is proportional to the array area but 
independent of the wavelength. The reason is that an isotropic transmit 
antenna radiates identically irrespective of the wavelength, while it is the 
area of the receiver array that determines what fraction of the signal power it 
captures. However, Area = M 7/4, so if the wavelength reduces, the number 
of antennas must grow as 1/A? to maintain the array area. 


By following the same steps as in Section 4.3.4, one can show that the 
numerator of Ajy,,(®) is a periodic function of ® with period 1/Ly,, = 
1/(MyAy,) and the denominator is periodic with period 1/A). Similarly, 
the numerator of Ajy,(Q) is a periodic function of Q with period 1/Ly,, = 
1/(MyAy) and the denominator is periodic with period 1/A). Hence, both 
Amy(®) and Am, (Q) have a period of 1/A,. We also have that 


Amy (4) =0, m=+1,...,+(My-1), (4.141) 
Ly, 
Amy (=) = 0, een el E e (My — 1). (4.142) 


These points correspond to the nulls of the beamforming gain pattern and 
thereby characterize the beamwidths. It is sufficient that one of these functions 
is zero to obtain a null. This implies that for a given value of 6, we can find a 
value of ọ that results in a null (and vice versa). One can measure both the 
horizontal and vertical beamwidths, which are generally different. 

We will consider a few special cases to shed light on the angular locations 
of the nulls. We begin by considering the horizontal plane where 0 = @beam = 
0. We then have Q = sin(@) — sin(@bcam) = 0 and ® = sin(y)cos(#) — 
sin(Ypeam) COS(Abeam) = sin(y) — sin(Ypeam). The latter coincides with (4.78) 
that was derived for a ULA; thus, the horizontal beamwidth of a UPA is the 
same as for a ULA with the same number of antennas horizontally. 

Figure 4.38 shows the beamforming gain in (4.139) when transmitting 
in the broadside direction where Ypeam = 0 and beam = 0, which results 
in Ayy(®) Am, (Q) = MvAm,(sin(y)). The antenna spacing is A, = 1/2 
wavelengths. We compare two arrays: i) a UPA with My = 10 horizontal 
antennas and My = 4 vertical antennas; ii) a ULA with M = My = 10 
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Figure 4.38: Comparison of the beamforming gain in (4.139) with a UPA (My = 10, My = 4) 
and a ULA (M = My = 10, My = 1) in the horizontal plane where 0 = 0. The arrays transmit 
in the broadside direction Ypeam = 0 and beam = 0. 


m 
2 


horizontal antennas (and My = 1). The beamforming gain is shown as a 
function of the observation azimuth angle ọ in the horizontal plane (where 
6 = 0). The figure validates that the beamforming gain in this plane is only 
determined by the horizontal lengths of the arrays, which are identical for 
the UPA and ULA. Hence, the null locations and the shape of the lobes are 
the same in both cases. However, since the UPA has M = 40 antennas while 
the ULA has M = 10, there is a 40/10 ~ 6dB vertical difference between the 
beamforming gain patterns. The maximum beamforming gain for the UPA is 
40 ~ 16dB, whereas the maximum beamforming gain for the ULA is 10dB. 

If we keep transmitting in the broadside direction but consider the plane 
where 0 = 7/4, then we have Q = sin(@) — sin(@beam) = 1/V2 and ® = 
sin(y) cos(@) — sin(Ypeam) COS(Obeam) = sin(y)/V2. The beamforming gain 
for different azimuth angles will then be determined by Amy (®) An, (Q) = 
Amy (sin(y)/V2) Amy (1/V2). The division by V2 in the first factor leads to 
a widening of the beamwidth compared to having 0 = 0, while the second 
factor leads to a loss in beamforming gain since Ajy,(1/V2) < 1 for My > 1. 
Figure 4.39 compares the beamforming gains observed at different azimuth 
angles when the elevation angle is either 0 = 0 or 6 = 7/4. As expected, the 
nulls move outwards when @ increases so that the beamwidth increases, but 
the beamforming gain is reduced. 

The vertical beamwidth only depends on the number of vertical antennas. 
To show this, we continue transmitting in the broadside direction (i.e., Poeam = 
beam = 0) and consider the vertical plane where y = 0. We then have 
® = sin(y) cos(9) — sin(Ypeam) COS(Oheam) = 0 and Ajy,,(0) = My. On the 
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Figure 4.39: The beamforming gain that is observed in different azimuth directions y when a 
UPA with My = 10 and My = 4 transmits a beam in the broadside direction Ypeam = 0 and 
beam = 0. The pattern depends on the elevation angle 0. 


other hand, Q = sin(@) — sin(fpeam) = sin(@). From (4.142), the 2|Zy,,| null 
directions are obtained as!* 


n 
d= i 
arcsin ( T 


|; n=+1,...,4|Ly,]. (4.143) 
VA 


A ULA with My = 1 and normalized vertical length Ly,, = 0.5 has no nulls 
in the vertical plane, so its beams have no vertical directivity. However, if we 
extend the ULA to a UPA by adding antennas in the vertical plane, we achieve 
a directive beam in both the horizontal and vertical planes. Figure 4.40 shows 
the beamforming gain in this vertical plane for the same UPA and ULA as 
in Figure 4.38. The ULA achieves a constant gain for all elevation angles. 
However, the UPA with My = 4 has a normalized vertical length of Ly,, = 2 
so there are 2Ly,, = 4 nulls in the vertical plane. The null directions are 
6 = +arcsin(1/2) = +7/6 and 0 = tarcsin(1) = +7/2. 

Figure 4.41 shows the beamforming gain pattern in the 3D half-space 
x > 0 when the antenna array is deployed in the yz-plane and beamforms 
in the broadside direction. Figure 4.41(a) considers the UPA with My = 10 
and My = 4, while Figure 4.41(b) considers the ULA with My = 10 and 
My = 1. The dotted black curve shows the angles representing the horizontal 
plane previously considered in Figure 4.38, while the dashed blue curve shows 
the vertical plane previously considered in Figure 4.40. The white areas 


4A] the nulls of the function Am, (Q) are os by (4.142). We should find the values of 0 
that give Q = sin(@) = n/Ly,, for n = +1,.. (My — 1). This equation can only be solved 
when |n| < Ly,,, and the feasible solutions are given in (4.143). 
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Figure 4.40: Comparison of the beamforming gain in (4.139) with a UPA (My = 10, My = 4) 
and a ULA (M = My = 10, My = 1) in the vertical plane where y = 0. The arrays transmit in 
the broadside direction Ypeam = 0 and Opeam = 0. 


represent directions that are close to the nulls. We notice that the UPA 
achieves directivity in both the azimuth and elevation directions, while the 
ULA has an azimuth directivity but spreads its signal equally for all elevation 
angles. The 3D directivity achieved by a UPA makes the transmission more 
confined to the directions close to the intended receiver. 


Example 4.20. Consider a UPA with My = 10, My = 4, and A) = 1/2. 
If it beamforms in the broadside direction as in Figure 4.41(a), what is the 
first-null beamwidth in the horizontal and vertical planes? 

In the horizontal plane where 0 = 0, ® in (4.133) becomes 


® = sin(y) cos(0) — sin(0) cos(0) = sin(y) (4.144) 


since Ypeam = beam = 0. The nulls in the horizontal plane occur when 
Amy (®) = 0. Since Ly,, = MyA) = 5 wavelengths, it follows from (4.141) 
that the first nulls are at the azimuth angles y = tarcsin(1/5). Hence, the 
first-null beamwidth is 2 arcsin(1/5) ~ 0.4 (23°) in the horizontal plane. 

In the vertical plane where y = 0, Q in (4.134) becomes 


Q = sin(6) — sin(0) = sin(@). (4.145) 


The nulls in the vertical plane occur when Amy (Q) = 0. Since Ly, = 
MyA) = 2 wavelengths, it follows from (4.142) that the first nulls are at the 
elevation angles 0 = +arcsin(1/2) = +7/6. Hence, the first-null beamwidth 
is 27/6 = 7/3 (60°) in the vertical plane. 
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(a) Beamforming with a UPA having My = 10 horizontal antennas and My = 4 vertical antennas. 
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(b) Beamforming with a ULA having My = 10 horizontal antennas. 


Figure 4.41: The beamforming gain that is observed in different 3D directions for the UPA 
and ULA setups that were considered in Figure 4.38 and Figure 4.40. The dotted black curves 
show the gain variations in the horizontal plane where 0 = 0 and the dashed blue curves show 
the gain variations in the vertical plane where y = 0. These are the same gain patterns as shown 
in the previous figures. 
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When a plane wave impinges on the UPA from the angle (y,6), the 
antennas in the array take simultaneous samples of the waveform. Suppose 
the information-bearing signal has the wavelength A. The angle-of-arrival 
determines the phase-shift differences between the antennas and, thereby, 
what spatial frequencies in the range [—1/A,1/A] are present in the channel 
vector. Since the UPA extends in two dimensions, the channel vector can 
resolve spatial frequencies horizontally and vertically, which are generally 
different. Recall that we consider a UPA deployed in the negative parts of the 
yz-plane with one antenna in the origin. At any given time, the phase-shift 
seen along the horizontal negative y-axis is 


0 1” [cos(y) cos() on 
—y| |sin(y) cos(@) | = -7Y sin(p) cos(0), (4.146) 
0 sin(0) 


2T 


A 


relative to the origin. This implies that the channel contains the horizontal 
spatial frequency sin(y) cos(@)/X periods per meter. Similarly, the phase-shift 
seen along the negative vertical z-axis is 


9, | 9 T Tcos(y) cos(0) 
— o0 sin(y) cos(@) | = ——zsin(0), (4.147) 
À A 

-=z sin(0) 


thus the channel contains the vertical spatial frequency sin(@)/X periods 
per meter. The vertical frequency depends on the elevation angle, while the 
horizontal frequency depends on both the azimuth and elevation angles. Each 
frequency can take values in the range [—1/A,1/A], but since both values 
depend on the elevation angle, only some combinations of frequencies can 
occur. Figure 4.42 shows the feasible combinations, which are contained within 
a circle with a radius of 1/X. Points at the outer boundary are achieved when 
p = +35 while 0 is varied throughout its feasible range. 

The concept of horizontal and vertical spatial frequencies is further illus- 
trated in Figure 4.43 for different angle-of-arrivals. The coloring shows the real 
part of the impinging plane wave at a time instance when the phase is zero at 
the origin. The wave variations are shown for a square area with width 4\ and 
height 4A, but the antenna array only samples it at 81 discrete points; that is, 
the UPA has My = My = 9 antennas per dimension with the spacing A = \/2. 
Figure 4.43(a) considers a plane wave arriving from the broadside direction 
y = 0 = 0. In this case, the horizontal and vertical spatial frequencies are zero 
because the UPA is deployed perpendicularly to the direction the wave travels. 
No phase variations exist between the antennas; therefore, the entire array 
surface has the same color. Figure 4.43(b) considers a plane wave arriving from 
the direction y = 7/6, 0 = 0, which represents a rotation in the horizontal 
plane. The horizontal spatial frequency is sin(7/6) cos(0)/A = 1/(2A), which 
explains why the waveform repeats itself at points that are separated by 
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Figure 4.42: The combinations of horizontal and vertical spatial frequencies that the channel 
to/from a UPA can contain are all contained in a circle with the radius 1/A. The horizontal 
frequency is sin(y) cos(@)/X and the vertical frequency is sin(@)/A, where ọ is the azimuth angle 
and @ is the elevation angle. 


2X horizontally. The vertical spatial frequency is sin(0)/A = 0, as seen from 
the fact that there are no vertical variations in the waveform. Figure 4.43(c) 
considers the case of y = 0, 0 = 7/4, where the plane wave is rotated in the 
vertical plane compared to the UPA. The horizontal frequency is zero, while 
the vertical frequency becomes sin(7/4)/X = 1/(\/2A), so the wave repeats 
itself at points that have a vertical separation of /2. Finally, Figure 4.43(d) 
considers a plane wave arriving from the direction y = 7/4, 0 = 7/4, for 
which both the horizontal and vertical frequencies are non-zero. The horizontal 
spatial frequency is sin(a/4) cos(7/4)/A = 1/(2A), which is the same as in 
Figure 4.43(b). The vertical spatial frequency is the same as in Figure 4.43(c); 
thus, these frequencies are simultaneously achievable (i.e., they are within the 
circle shown in Figure 4.42). 


Example 4.21. What fraction of all horizontal/vertical spatial frequency 
combinations are practically achievable? 

The horizontal spatial frequency can take any value in the range [—1/A, 1/A], 
which is a range of length 2/\, and the same holds for the vertical spatial 
frequency. This corresponds to a total area of 4/A? of possible combina- 
tions. However, the practically achievable combinations are contained in 
the circle in Figure 4.42, which has the area m(1/A)?. Hence, a fraction 
m(1/A)?/(4/A?) = 7/4 ~ 0.79 of all combinations are practically achievable. 
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(a) Angle-of-arrival: p = 0, 0 = 0. (b) Angle-of-arrival: p = 7/6, 0 = 0. 
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(c) Angle-of-arrival: p = 0, 6 = 7/4. (d) Angle-of-arrival: p = 1/4, 0 = 7/4. 


Figure 4.43: When a plane wave with wavelength \ impinges on a UPA, the angle-of-arrival 
(vy, 0) determines the wave variations simultaneously observable over the array’s surface. The 
real part of the wave is shown for four different angle-of-arrivals using colors to represent the 
value. The horizontal and vertical spatial frequencies differ depending on the angle-of-arrival, 
as seen from the color patterns. The UPA consists of 81 antennas (My = My = 9) with the 
spacing A = 4/2. 
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4.5.4 Effective Array Response with Directive Antennas 


Another benefit of specifying the array response vectors in terms of both 
the azimuth and elevation angles is that we can readily extend the model 
to support arrays of directive antennas. Recall from Section 1.1.3 that the 
directivity of an antenna is determined by the antenna gain function G(y, 6), 
which specifies the angular variations in the antenna gain compared to an 
isotropic antenna. We will analyze a MISO channel where the M transmit 
antennas have the same antenna gain function Gi(y,0), while the receive 
antenna is isotropic. We can then define the effective array response as 


Gi (¢, Aam(¢, 4). (4.148) 


and let Biso = A?/(4md)? denote the channel gain when both the transmitter 
and receiver have isotropic antennas. If the transmitter has a ULA with 
antenna spacing A, then the channel vector becomes 


1 


A sin(~) cos(@) 
jan Sco 


h = VA Gdp, amle) = Bao (Guly.8) | 


(M—1)A sin(y) cos(@) 
A 


e jaa 
(4.149) 
The capacity of this MISO channel can be computed using (4.46) as 
P|ih||? PG, (49, 9) M Biso 
C= Bl 1+ —— |] =BI 14 : 4.1 
oga ( T BNo ) oga ( BN, ) (4.150) 


The only difference from the capacity expression in (4.47) for ULAs with 
isotropic antennas is that the SNR is multiplied by the antenna gain G;(y, 0). 
Hence, if Gi(y,0) # 0, a beamforming gain of M can be achieved using 
an array of directive antennas. This is the same beamforming gain as with 
isotropic antennas; thus, the SNR grows proportionally to the number of 
antennas, but the proportionality constant depends on the antenna gain 
in the angular direction leading to the receiver. By replacing Gi(y, 0) with 
G,(y, 0), the capacity expression in (4.150) applies to a SIMO channel where 
the receiver is equipped with M antennas with the gain function G,(y, 0) and 
the transmitter is equipped with an isotropic antenna. Note that the MISO 
capacity is achieved using the MRT vector aj,(y, 0)/ VM, while the SIMO 
capacity is achieved using the MRC vector am(¢,0)/V M. These are the same 
vectors as with isotropic antennas because the antenna gain only changes the 
scaling of the channel vector, not its direction in the vector space. 

Recall that the primary purpose of beamforming is to control the directivity 
of the transmission. A directive antenna has a fixed directivity, while an array 
of isotropic antennas can form beams in any direction and always achieve the 
same maximum gain. In practice, arrays of weakly directive antennas are often 
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Broadside direction 


Figure 4.44: A ULA with cosine antennas can form beams in most directions, but the strength 
of the beams will depend on the beam direction. The maximum gain is achieved in the broadside 
direction and then tapers off for other angles, determined by the shape of the antenna gain 
function. 


utilized when not all angular directions are important. For example, an array 
might be deployed to serve user devices in a 120° sector of the horizontal 
plane; that is, y € [—7/3,7/3]. We can then utilize an array of the cosine 
antenna from Section 1.1.3, which has the gain function 


‘aia cos(@), if y € [—7/2, 7/2], 6 € [—7/2, 7/2], 


G(y,0) = 
(9,9) 0, elsewhere. 


(4.151) 


It provides the maximum antenna gain of 4 (i.e., 6dBi) when transmitting 
to receivers located in the direction (y,@) = (0,0) and an antenna gain 
of 2 (i.e., 3dBi) when transmitting to receivers located in the directions 
(y, 0) = (+7/3,0) at the edges of a 120° sector. Since these gains are larger 
than one, all users located in the intended sector will benefit from having this 
directive antenna (compared to having an isotropic antenna) but to a varying 
extent. Note that the antenna gain also varies with the elevation angle, but 
every user located in the interval y € [—1/3, 1/3], 6 € [-7/3, 7/3] will obtain 
an antenna gain larger than one; thus, preferring the cosine antenna over an 
isotropic transmit antenna. 

The combination of antenna directivity and beamforming makes the radi- 
ated signal even more directive than when using isotropic antennas, but the 
joint gain also becomes dependent on the beam direction. Figure 4.44 illus- 
trates this property by showing a collection of beams transmitted in different 
angular directions from a ULA equipped with cosine antennas. The beam 
radiated in the broadside direction is substantially larger/stronger than the 
beams radiated towards angles closer to the end-fire directions. The overlaid 
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Figure 4.45: If the receiver is not located in the broadside direction of the ULA, electrical 
beamforming can be utilized to phase-shift the signals to form a beam in the angular direction 
y leading toward the receiver. Alternatively, mechanical beamforming can be utilized where the 
transmitter is physically rotated so the receiver is in the new broadside direction. 


shape of the cosine antenna gain demonstrates how it dictates the strength 
that beams can get in different directions. 

We will now take a closer look at these properties. Figure 4.45 illustrates a 
setup where the transmitter is equipped with a ULA of cosine antennas with 
their maximum gain in the azimuth direction y = 0. The antenna spacing is 
A = 4/2. The receiver is located in another angular direction y 4 0. We will 
compare two ways of handling this situation. The first solution is to apply 
MRT, as described earlier in this chapter. We can refer to this as electrical 
beamforming since we are phase-shifting the radiated signals to form a beam 
in the desired direction. Another potential solution is to physically rotate 
the transmitter by the angle y so that the maximum gain is achieved in the 
direction towards the receiver. We refer to this as mechanical beamforming, 
and we can then transmit the same signal from every antenna. These solutions 
have different pros and cons, which we will explain by a numerical example. 

Figure 4.46 shows the joint beamforming and antenna gain achieved in 
different angular directions using a ULA with M = 10 cosine antennas. The 
receiver is in the direction (y, 0) = (7/4,0). Mechanical beamforming achieves 
a beamforming gain of 10 dB and the maximum antenna gain of 6 dBi, resulting 
in a joint gain of 16 dBi. Electrical beamforming also achieves a beamforming 
gain of 10dB but the antenna gain is only 4cos(7/4) cos(0) = 2/2 ~ 4.5 dB; 
thus, the joint gain is 14.5 dBi. If the transmit power is the same in both cases, 
the receiver will achieve a 1.5 dB lower SNR when using electrical beamforming. 
Nevertheless, it is beneficial to utilize directive antennas in this setup because 
the gain is 4.5dB larger than if isotropic antennas would have been utilized, 
as was the case in Figure 4.17. Another difference between mechanical and 
electrical beamforming is that the beamwidth becomes broader in the latter 
case, as shown in Figure 4.46. This can lead to more interference towards 
undesired receivers located in roughly the same direction as the desired receiver. 
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Figure 4.46: The joint beamforming and antenna gain that is observed in different azimuth 
directions p when a ULA with M = 10 cosine antennas transmits a beam in the direction 
beam = 7/4. We compare electrical beamforming (i.e., phase-shifting the transmitted signals 
using MRT) with mechanical beamforming where the ULA is rotated by 7/4 to have the 
broadcast towards the receiver, as shown in Figure 4.45. 


However, the wider beam can also be a benefit if the receiver’s angle is only 
approximately known. 


Although mechanical beamforming might seem like a viable competing 
technology, it is seldom used in mobile communications since changes in the 
mechanical rotation at the millisecond level are associated with many practical 
implementation issues. The benefit also vanishes in MIMO scenarios where 
multiple beams are to be transmitted simultaneously in different directions to 
achieve multiplexing gains. The flexibility of electrical beamforming generally 
makes it a superior technology; however, a careful selection of the antennas 
and mechanical rotation is necessary when deploying an array to ensure that 
the antenna gain is sufficiently large within the intended coverage area. For 
example, in a cellular network where the base stations are deployed tens of 
meters above the ground, it is common to mechanically downtilt the base 
station array by around ten degrees in the elevation angle domain, to focus 
the antenna gain on the places where the prospective users are instead of 
towards the horizon. Electrical downtilt in the form of beamforming is then 
utilized to adapt the transmission to the current user location. Base station 
arrays are also rotated horizontally at the deployment stage to point toward 
the center of their intended coverage area, while electrical beamforming is 
used to point beams in the azimuth direction where the user currently resides. 
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Example 4.22. Consider a UPA with My = 10, My = 4, Ay = 1/2, and 
cosine antennas. What is the joint beamforming and antenna gain if the UPA 
is mechanically rotated to transmit to a receiver in the broadside direction? 
What is the joint beamforming and antenna gain if the UPA is electrically 
rotated to transmit to a receiver in the direction y = 0 and 6 = —7/4? How 
does the first-null horizontal beamwidth differ between these setups? 

The antenna gain function in (4.151) is G(0,0) = 4 in the broadside 
direction, while the beamforming gain in (4.139) becomes M = Aj9(0)A4(0) = 
40. Hence, the joint beamforming and antenna gain is 4 - 40 = 160 ~ 22 dBi. 

With the electrical downtilt, the antenna gain becomes G(0,—1/4) = 2V/2 
while the beamforming gain remains M = Aj9(0)A4(0) = 40. Hence, the joint 
beamforming and antenna gain is 2v2 - 40 = 80/2 ~ 20.5 dBi. 

Example 4.20 showed that 2 arcsin(1/5) ~ 0.4 is the first-null horizontal 
beamwidth when transmitting in the broadside direction. In contrast, with 
the electrical downtilt, ® in (4.133) becomes 


® = sin(y) cos(—7/4) — sin(0) cos(—7/4) = sin(y)/V2. (4.152) 


According to (4.143), the first nulls in the horizontal plane occur when 

= +1/Ly,,. The normalized horizontal length of the considered UPA is 
Ly,, = MyA) = 10- 0.5 = 5 wavelengths. By solving for the azimuth angle 
p, we obtain that the first nulls are at y = tarcsin(/2/5) and the first-null 
horizontal beamwidth is therefore 2 arcsin(/2/5) ~ 0.57. 

In summary, the electrical downtilt results in a 1.5 dBi loss in antenna gain 
and an increased beamwidth by 42% since arcsin(/2/5)/aresin(1/5) =% 1.42. 
The benefit is that there is no need to mechanically rotate the array based on 
the receiver’s location in the coverage area. 


4.5.5 Effective Isotropic Radiated Power 


The maximum radiated power in a wireless communication system is deter- 
mined by regulations, which can differ between countries and frequency bands. 
The maximum power can be quantified in terms of the total radiated power, 
denoted by P in this book, without considering how this power is distributed 
over different angular directions. In regulations, it is common to consider the 
effective isotropic radiated power (EIRP), which also includes the antenna 
and beamforming gains. Notice that the SNR in (4.150) for a MISO system 
can be factorized as 


P\jhl|? 
BNo 


Biso 
BNo’ 


= PG;(y,0)M (4.153) 
—— 
EIRP 


which is effectively the same as for a SISO channel with a single isotropic 
antenna that radiates PG;(y,0)M. In terms of the received signal strength, 
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the receiver cannot tell whether the signal was radiated isotropically or with 
a strong directivity. Hence, if the goal of the regulation is to limit the worst- 
case radiation intensity (e.g., to comply with health guidelines and limit 
out-of-band emissions), then the EIRP must be regulated. In particular, the 
maximum EIRP over all angles can be computed as 


EIRPinax = max PG: (p, 8)M. (4.154) 
P, 


The maximum EIRP is proportional to total radiated power P, the maximum 
antenna gain maxo, Gt(y,@), and the beamforming gain M. 


Example 4.23. What is the EIRP if the total radiated power is 1 W, the 
antenna gain is 4, and the beamforming gain is 10? 

The EIRP is the product of these parameters: 1-4-10 = 40 W, which is often 
expressed in decibel scale as 46 dBm. Thanks to the directive transmission, 
the receiver will experience a received signal equivalent to the transmission of 
40 W from an isotropic antenna, although the transmitter only radiates 1 W. 


The EIRP limits can vary significantly between different frequency bands 
and geographical regions. Within the European Union, the guidelines for 
the 3.5 GHz band (utilized for 5G NR) is to have a maximum EIRP of 
68 dBm per 5 MHz of spectrum for base stations, while EIRP limit is only 
25dBm per user device [55]. There is a large power imbalance between the 
downlink and uplink transmissions, as previously discussed in relation to 
Figure 1.7. However, the EIRP numbers do not tell the whole story since the 
antenna/beamforming gains at the receiver side are omitted. For example, the 
downlink EIRP of 68dBm might be reached using a total radiated power of 
47 dBm (i.e., 50 W) and a joint antenna and beamforming gain of 21 dBi. The 
same antenna/beamforming gain can also be utilized when receiving the uplink 
transmission. Hence, the difference in total radiated power determines the SNR 
imbalance between uplink and downlink. The uplink EIRP limit of 25dBm 
might be reached using a total radiated power of 23dBm (i.e., 200 mW), an 
antenna gain of 2dBi, and no beamforming gain. Importantly, base stations 
are often allowed to increase their power proportionally to the bandwidth. 
This is not the case for user devices, so the power imbalance between uplink 
and downlink becomes more severe as the bandwidth increases. 


4.5.6 MIMO Channels with Arbitrary ULAs 


A far-field MIMO channel model was provided in Section 4.4.1 for the case 
when the transmitter and receiver are equipped with ULAs of isotropic 
antennas located in the same two-dimensional plane. We will now utilize 
the array responses derived in Section 4.5.1 to generalize the expression 
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Figure 4.47: Illustration of a free-space MIMO LOS communication setup where the transmitter 
is equipped with a ULA with K antennas and the receiver with a ULA with M antennas. The 
antenna spacing is A in each array. From the transmitter’s perspective, the angles-of-departure 


leading to the receiver are (yt, 6t). From the receiver’s perspective, the angles-of-arrival of the 
signal radiated by the transmitter are (4r, 0r). 


to support directive antennas, and ULAs arbitrarily rotated in the three- 
dimensional world. Figure 4.47 illustrates such a setup, where the receiver is 
in the far-field of the transmitter and the channel gain between any pair of 
transmit and receive antennas under the assumption of isotropic antennas is 
Biso = à? /(4rd)?. The transmitter is equipped with a ULA with K antennas 
and the antenna spacing is A. These antennas have the gain function G;(y, 0) 
and angles-of-departure leading towards the receiver is denoted by (p+, 4). 
The receiver has a ULA with M antennas and the same antenna spacing 
A. These antennas have the gain function G,(y, @) and the radiated signal 
impinges as a planar wavefront with the angles-of-arrival denoted by (pr, 0r). 
We can utilize (4.149) to conclude that the channel from the first transmit 
antenna to the M receive antennas is WBiso Gi (Yt, 0t) V Gr (Pr, Oram (Yr; Or), 
by also taking the antenna gain G;(y;, 0+) of the transmitter into account. Sim- 
ilarly, from the transmitter’s perspective, the channel from the K transmit an- 


tennas to the first receive antenna is VBiso Gt (+, %) /Gr(Yr; Oak (pr, 4). 
By combining these results, we conclude that the complete channel matrix is 


H = VBisoy/ Ge( oe, 9) y/ Gel Gr, Oram (Gr, Or) aK (Ye, b+). 


(4.155) 


This is a rank-one matrix, which aligns with previous observations in this 
chapter. The non-zero singular value is 


81 = y BisoM K Ge (ye, 8) (rs x) (4.156) 


which now depends on the antenna gains at both the transmitter and receiver. 
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The MIMO channel capacity becomes 


(4.157) 


io MK 
C = logs (1 = Gil pr, 0:)G: (Pr, 0,) Pa ) , 


No 


which coincides with (4.96) when Gs(pr, 0t)Gr(Yr, 0r) = 1. Depending on how 
the antennas are directed, the value of the capacity can be either higher, 
lower, or the same as with isotropic antennas. The same beamforming gain of 
MK is obtained irrespective of the choice of antennas (as long as the gain is 
non-zero). The capacity is achieved by using the MRT vector až (p+, 0)/ VK 
for precoding and the MRC vector am (pr, 0r)/ VM for combining. 


Example 4.24. Consider a free-space MIMO channel between a base station 
and a user device, both equipped with UPAs. The Cartesian coordinates of 
the center points of the base station and the user device are (0,0,0) and 
(300, 300, 300/2) in meters, respectively. The number of antennas at the base 
station (receiver) and user device (transmitter) are M = 32 and K = 4, 
respectively. All antennas have the cosine gain function given in (4.151). The 
two UPAs are deployed along the yz-plane, and their broadside directions 
face each other (i.e., the base station has zero gain for x < 0 and the device 
has zero gain for x > 300). What is the capacity of the considered MIMO 
channel for q = 1078 W/Hz, No = 1071" W/Hz, and à = 0.1m? 

We can determine the capacity of the considered free-space MIMO channel 
using (4.157) and fiso = à?/(4rd)? with M = 32, K = 4, q = 1078 W/Hz, 
No = 10-1” W/Hz, and \ = 0.1m. It becomes 


(4.158) 


10> 01 32-4 
C = logs (1 + G(pr, 0)G (pr, Or) ) , 


10-17 . (47)?d? 


where G(ọ, 0) is the antenna gain function in (4.151). The distance between 
the transmitter and receiver is computed as 


d = / (300 — 0)? + (300 — 0)? + (300V2 — 0)? = 600m. (4.159) 


Let us first determine the angles-of-arrival (Yr, 0r) from the user device to the 
base station. From the given geometry of the UPAs, p, = 7/4 and 6, = 7/4. 
The angles-of-departure (p+, 0+) from the user device to the base station are 
computed as yy = 7/4 and 6 = —7/4. Hence, the cosine antenna gains are 
obtained as G(s, %) = G(r, 0r) = we = 2. Inserting these values into the 
capacity expression, we obtain 

10-8 .0.12- 32-4 


C loge, (1 H2. 20 a7 Tle aa) = 6.5 bit/symbol. (4.160) 
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(a) Horizontally polarized wave. (b) Vertically polarized wave. 


Figure 4.48: Electromagnetic waves can have different polarization, as represented by the 
direction in which the electric field oscillates. The figure shows two waves with orthogonal 
polarizations oscillating horizontally or vertically. The thick arrows show the dimensions of the 
oscillations and in which direction the amplitude is positive. 


4.6 Polarization of Electromagnetic Waves 


An electromagnetic wave travels in one direction, but the electric field oscillates 
(like a sinusoid) in a perpendicular direction. When a plane wave propagates 
along one dimension of our three-dimensional world, there are two possible 
perpendicular dimensions in which the electric field could oscillate, or it can 
be a linear combination of them. The direction of the oscillations is called the 
polarization of the wave. Figure 4.48 shows an example of the wave propagating 
along the x-axis. When the electric field oscillates in the horizontal plane (i.e., 
along the y-axis), we have a horizontally polarized wave. When the electric 
field oscillates in the vertical plane (i.e., along the z-axis), we have a vertically 
polarized wave. This is an example of a pair of orthogonally polarized waves 
since the electric fields exist in entirely different dimensions. One can find 
other pairs of orthogonal waves by rotating both waves with the same angle 
in the yz-plane. However, one cannot find more than two orthogonal waves 
since there are only two dimensions.!° 

Each antenna has predetermined polarization properties, in the sense that 
it radiates waves with a given polarization and responds to impinging waves 
with the same polarization. For example, a horizontally polarized antenna 
radiates and responds to waves of the horizontally polarized kind shown in 
Figure 4.48(a) and might have a physical shape similar to the thick arrow 


15We have only exemplified linearly polarized waves for which the electric field oscillates in a 
single dimension. One can also create circularly/elliptically polarized waves for which the electric 
field rotates in the plane perpendicular to the direction of travel, which means that the direction 
of the oscillations is time-varying. For example, if the wave travels along the x-axis, then the 
wave’s electric field could rotate in the yz-plane. In this case, a clockwise and a counter-clockwise 
rotation lead to orthogonal polarization. Electromagnetic fields are also characterized by their 
magnetic fields, which are orthogonal to the electric fields and the direction of travel, and thus 
oscillate as the orthogonal wave’s electric field. 
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illustrated along the y-axis. Similarly, a vertically polarized antenna radiates 
and responds to waves of the vertically polarized kind shown in Figure 4.48(b) 
and might have a physical shape similar to the thick arrow illustrated along 
the z-axis. Note that the physical orientation of an antenna is essential when 
interpreting these things. A horizontally polarized antenna can be rotated by 
90° to become a vertically polarized antenna and vice versa. If a horizontally 
polarized antenna is rotated by 180°, it remains horizontally polarized, but the 
notions of up and down are switched, corresponding to changing the signal’s 
sign. The thick arrows in Figure 4.48 show the direction where the signal 
attains positive values in this coordinated system. 

The analysis in this chapter has implicitly assumed that all antennas have 
matching polarization. However, there are three main reasons for generalizing 
the analysis. Firstly, we cannot control the device’s orientation in mobile 
communications since the user must be allowed to hold and rotate it arbitrarily. 
For instance, a mobile phone antenna might be vertically polarized when held 
against the ear but horizontally polarized when put on a table. This calls for 
the use of multiple antennas with different polarization at the base station 
so that it can always generate waves with the device’s currently preferred 
polarization.'© Secondly, two antennas with opposite polarization can be co- 
located, which enables doubling the number of antennas that fit in a given 
physical aperture area. Thirdly, polarization creates an extra dimension that 
can be used for spatial multiplexing over LOS channels, which was recognized 
already in the 1980s [58] (i.e., before spatial multiplexing through beamforming 
was discovered). The latter property will be the focus of this section. 


4.6.1 Channel Capacity with Dual-Polarized Antennas 


Suppose the transmitter has two antennas with orthogonal polarization. Since 
the antennas have different orientations, they can be centered around the 
same point to create what is called a dual-polarized antenna: two antennas at 
one location but with orthogonal polarizations. In this section, we consider the 
setup in Figure 4.49, where the receiver is equipped with the same antenna 
configuration, including identical rotations. Just as any other MIMO system 
with M = K = 2, the considered setup can be described by the MIMO system 
model in (3.56): 

y = Hx+n, (4.161) 


16 Any direction of the electric field can be obtained as a superposition of two orthogonal 
electric fields, for example, generated using horizontal and vertical polarizations. Practical 
base stations often utilize dual-polarized antennas with +45° slanted polarizations, a pair of 
orthogonal polarizations between the horizontal and vertical directions. The reason is that many 
propagation environments provide better conditions for either the horizontally or vertically 
polarized wave component; for example, the vertical polarization often leads to stronger signals 
in mobile communications since the waves mostly travel horizontally between the base station 
and user device, and are reflected off vertical objects (e.g., buildings) that are better at reflecting 
vertically polarized waves [56], [57]. In any case, we can achieve a balanced power per antenna 
by dividing these dimensions equally between the antennas by using slanted polarization. 
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Figure 4.49: Illustration of a setup where the transmitter and receiver are equipped with 
dual-polarized antennas with identical rotations in the yz-plane. 


where the elements of y € C? correspond to the two receive antennas with 
orthogonal polarizations and the elements of x € C? correspond to the two 
transmit antennas with orthogonal polarizations. The assumption of dual- 
polarized antennas only affects the modeling of the channel matrix H € C?*?. 

We consider an LOS channel where d is the distance between the dual- 
polarized transmit antenna and the dual-polarized receive antenna; thus, the 
channel gain is 8 = at If we order the antennas so that transmit antenna 
m and receive antenna m have matching polarization, for m = 1,2, then we 
obtain the channel matrix 


H = y8 0 | = ybi. (4.162) 


The diagonal elements are y£ just as for a single-antenna LOS channel where 
the antennas have equal polarization. In contrast, the off-diagonal elements are 
zero because the corresponding antennas have orthogonal polarizations. Both 
singular values equal ,/B since the channel matrix is a scaled identity matrix. 
This is an ideal type of MIMO channel for spatial multiplexing because we 
can transmit two parallel data streams that experience equally strong singular 
values. Hence, the water-filling power allocation will result in equal power 
allocation: q1 = q2 = q/2. The channel capacity in (3.75) becomes 


C = 2log, (1 + £) bit/symbol. (4.163) 
2No 

The transmit precoding and receive combining that achieves the capacity is 
trivial: send the mth stream from the mth transmit antenna and receive it 
using only the mth receive antenna. Since the orthogonality between the data 
streams is achieved by the different polarization rather than using different 
spatial beams, it is more appropriate to call this polarization multiplexing 
than spatial multiplexing. However, the underlying MIMO capacity theory is 
the same; it is just the physical interpretation that differs. 

It is instructive to compare the capacity of the dual-polarized 2 x 2 MIMO 
channel in (4.163) with the capacity in (4.96) for a far-field MIMO setup with 
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Figure 4.50: Comparison of the capacities of 2 x 2 MIMO LOS channels with either two 
single-polarized antennas or one dual-polarized antenna on each side. The SISO capacity is 
shown as a reference. 


ULAs where all the antennas have the same polarization. For M = K = 2, 
(4.96) becomes logs(1 + ae), where there is no multiplexing gain but a 
beamforming gain of MK = 4. Figure 4.50 shows the capacities as a function 
of SNR = qG/No. This is the SNR achieved in a SISO system and its capacity 
logs(1+ SNR) is shown as a reference. The single-polarized setup achieves the 
highest capacity at low SNR thanks to the beamforming gains obtained at both 
the transmitter and receiver. The dual-polarized setup performs identically to 
the SISO setup at low SNR because each antenna transmits isotropically, and 
each receive antenna only captures power from one transmit antenna. We can 
show this mathematically using the low SNR approximation in (3.2): 


2 log, (1 + e) x maa eae = loga(e)SNR ~ loga (1 + SNR). (4.164) 
However, the dual-polarized setup can use the multiplexing gain to achieve a 
significantly higher capacity at high SNR. Since the single-polarized MIMO 
channel in (4.90) has rank 1, while the dual-polarized MIMO channel in 
(4.162) has rank 2, the capacity curve has a steeper slope in the latter case 
and eventually provides the largest capacity. The SNR range where the dual- 
polarized setup provides the highest capacity can be identified as follows: 


2 log, (1 + R) > loga (1 + 4SNR) 


2 
= (1 $ =) >144SNR => SNR2>12. (4.165) 
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The intersection point is SNR = 12 = 10.8dB, which can be observed in 
Figure 4.50. This SNR value is six times higher than in (4.99), where single- 
polarized MIMO channels with rank-one and rank-two were compared. The 
reason for the difference is that half the power is lost over the dual-polarized 
channel. In conclusion, dual-polarized antennas reduce the total received 
power but create an extra dimension that increases the high-SNR capacity. 


Example 4.25. Suppose the dual-polarized transmit antenna is rotated by 45°, 
compared to the dual-polarized receive antenna, as illustrated in Figure 4.51. 
How will the channel matrix in (4.162) and channel capacity change? 

The first antenna (blue) and the second antenna (red) of the receiver 
in Figure 4.51 have polarizations along the y-axis and z-axis, respectively. 
By contrast, the transmitter has an antenna configuration rotated counter- 
clockwise by 45° in the yz-plane. Each arrow points in the direction that 
the wave takes positive values. Hence, the first receive antenna obtains the 
summation of the signal components along the y-axis, which is Ip The 


minus sign appears because the red arrow points toward the negative y-axis 
and the term 1/ V2 describes that only half the power is radiated in the y- 
dimension. Similarly, the second antenna of the receiver receives a summation 
of the components of the signals along the z-axis, which is aa Hence, the 
MIMO channel matrix is 


Il 1 1 1 
H= y$ E | = Z Y J VĒL b, (4.166) 
2 J2 JA a =r >” 
a eV 
=U 


where the second equality provides the SVD. The singular values are sı = 

= \/B, just as with identically rotated antennas in (4.162). Therefore, 
the channel capacity is the same as in (4.163) but is achieved differently. 
Using Theorem 3.1, we can conclude that the capacity is achieved when 
the transmitter sends two independent data streams xı ~ Nc(0,q/2) and 
x2 ~ Nc(0,q/2) and the receiver applies the receive combining y = U"y 
to separate them, which corresponds to compensating for the polarization 
mismatch by computing yı = “ee and J2 = d 


Alternatively, the transmitter an generate two independent data streams 
zı ~ Nc(0,q/2) and z2 ~ Nc(0,q/2) and apply the transmit precoding 
x = U"x beacuse Hx = yBUU¥®x = 8X. This corresponds to using the two 
rotated antennas to transmit signals with horizontal and vertical polarization. 

In summary, it is sufficient to study the case with identical antenna 
rotations when characterizing the capacity with dual polarization. To achieve 
the capacity in practice, the exact channel matrix must be known so that the 
transmission can compensate for potential antenna rotations. 
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Figure 4.51: Illustration of a setup where the transmitter and receiver are equipped with 
dual-polarized antennas that differ in rotation by 45° in the yz-plane. 


4.6.2 Impact of Finite Cross-Polar Discrimination 


Even if antennas are designed to have orthogonal polarization, there is usu- 
ally cross-talk between the polarizations; for example, caused by imperfect 
polarization discrimination in the individual antennas and imperfect isolation 
between the co-located antennas [59], [60]. These effects can be analyzed 
separately and in great detail, but that is beyond the scope of this chapter. 
To study their collective impact on the channel capacity, we measure the 
purity of a dual-polarized antenna by the cross-polar discrimination (XPD) 
factor, which is defined as the ratio between the power radiated into the 
intended polarization direction and the power transmitted into the orthogonal 
polarization direction. We will not distinguish whether this issue is created 
since the intended antenna partially radiates into the unintended polarization 
direction or if the signal leaks into the co-located opposite-polarized antenna 
and is then radiated into the unintended polarization direction. The XPD 
correspondingly affects the reception, so there is symmetry in the system. 
We will now consider polarized antennas that transmit a fraction (1 — y) of 
the total power into the intended polarization (for any of the reasons above) 
and a fraction y into the opposite polarization. The parameter y € [0,1] 
characterizes the impurity of the antenna (a smaller value is better). Note 
that (1 — y) + y = 1, which implies that the total power is divided between 
the two orthogonal polarizations without losses. The XPD of such an antenna 
a 1 1 
— 
XPD = = > 7=IFXPD' 


A larger value of y corresponds to a smaller XPD and vice versa. 

Suppose we transmit a signal with power P to a polarized receive antenna 
of the same kind. A signal component with power (1 — y)P is radiated with 
the intended polarization. If the channel gain is 8 € [0,1], a fraction 6 of 
the transmitted power reaches the receive antenna and a fraction (1 — 7) is 
properly received. Hence, the received signal component has power (1—7)? P£. 
Moreover, a signal component with power yP will be radiated using the 
opposite polarization, and the receive antenna will then capture a fraction y8 


(4.167) 
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of it. Hence, the total received power is 
(1—7)?PB+ PB = (1-2(1—y)y) PB. (4.168) 


Suppose the receive antenna instead has the opposite polarization direction 
but the same XPD. It will then receive two signal components: one that leaks 
into the wrong polarization at the transmitter and one that leaks into the 
wrong polarization at the receiver. Due to the assumed XPD symmetry, each 
of them has power (1 — y)yP8; thus, the total received power is 
(l—y)yPB+ (1—y)yPB = 2(1 — y)yPB. (4.169) 

Note that the sum of (4.168) and (4.169) is P8; thus, a dual-polarized receive 
antenna can capture all the signal power that reaches the receiver, irrespective 
of the XPD. This observation can be extended to the case when the transmitter 
is a user device with an arbitrary orientation. The combined effect of the 
orientation and XPD will determine how the received power is distributed 
over the two polarizations of the dual-polarized receive antenna. However, 
the total power of the impinging wave over the antenna’s effective area will 
always be captured. This is a somewhat intuitive result but requires much 
more notation to formalize mathematically; thus, it has been omitted here. 

We will now study the impact of the XPD on the MIMO channel capacity. 
The discussion above is related to the power of signals, while the channel 
matrix describes how the amplitude and phase change. Hence, based on (4.168) 
and (4.169), we can write the channel matrix as 


asya V a Me cae vi 


Va -=y vi-l- vk VI- Rk]? 
(4.170) 
where £ still denotes the channel gain and we have defined 
2XPD 
= 2(1- = nm 4.171 
k= 2(1—9)¥ (1+ XPD)? (4.171) 


as the total fraction of power that leaks from a transmitted signal with one 
polarization to a received signal with the opposite polarization. The derived 
model is equivalent to the ones presented in [59], [60]. Note that « € [0,0.5], 
where the largest value is achieved for y = 1/2 and XPD = 1. The channel 
matrix in (4.170) reduces to (4.162) in the special case of x = 0 when the 
antenna polarizations are pure. The SVD of the channel matrix in (4.170) ist” 

ee 

a= vB vE vin 


Ja ¥v ena 0 jE 7 
Z v 0 VB(VI=*- VA] |a -i 
(4.172) 


17The SVD of H coincides with its eigendecomposition since H is a positive semi-definite 
Hermitian symmetric matrix. 
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Figure 4.52: The capacity of a 2 x 2 MIMO channel with dual-polarized antennas is affected 
by the XPD, but the effect is negligible for large XPD values. 


This implies that the MIMO channel can be divided into two parallel channels 
with sı = VB(V1 -sr + yk) and so = yB(V1-— «s — ys) as the singular 
values. The channel capacity is achieved by transmitting each signal over both 
polarizations, using the precoding vectors [1//2 1/2)" and [1/V2 —1/v2]F. 
The same vectors are also utilized for receive combining. 

The capacity can be computed using (3.75) as 


qis? q3P'83 
C = logs 1 + No + logs 1 + No 5 (4.173) 


where qP" = max(p — re 0) is the transmit power obtained from the water- 
filling power allocation. it follows from Corollary 3.3 that 


N 
opt JD fa< Bo oP 4.174 
q4 = q No No ( ` ) 
2 + Z572 D520 esn 
2 1 

0 if q < “gp — Ap 

t ’ ) 
Qe = ~ a + No No Fe a (4.175) 


3 2s? = 282 ; otherwise, 


where both channels are only utilized if the transmit power is above a threshold. 

Figure 4.52 illustrates the impact of XPD on the MIMO capacity. The ideal 
case of k = 0 is compared with XPD = 20 dB («x ~ 0.02) and XPD = 10 dB 
(k ~ 0.17), where the transformation from XPD to « is achieved using (4.171). 
When the XPD is large, the relation becomes « ~% 2/XPD. The figure shows 
that the polarization impurity caused by having a low XPD (such as 10 qB) 
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results in a capacity reduction at high SNR. The multiplexing gain is the same, 
as seen from the identical slopes of the curves, but the curve is shifted to the 
right, indicating a power loss from having singular values of different sizes. In 
contrast, the polarization impurity results in a minor capacity improvement 
at low SNR, where only the largest singular value sı is utilized, and sı is an 
increasing function of « (in the range [0,0.5] of possible parameter values). 
When the XPD reaches 20dB (or more), it has a negligible impact on the 
capacity. Many practical antennas operate in that regime. 


4.6.3 MIMO Channel Capacity with Dual-Polarized ULAs 


We will now consider a MIMO setup with arrays of dual-polarized antennas 
at both the transmitter and receiver. To keep the notation simple, we assume 
that the transmitter and receiver are equipped with ULAs located in the same 
two-dimensional plane (e.g., at the same height above the ground) and the 
antenna spacing is A = \/2. The transmitter has K/2 dual-polarized antennas, 
where K is an even number representing the total number of transmit antennas 
(counting both polarizations). The receiver has M/2 dual-polarized antennas, 
where M is an even number representing the total number of receive antennas. 
The XPD is characterized by the parameter « € [0,0.5] defined in (4.171). 

We order the antennas according to their polarization so vet Pain 
antennas 1,..., K/2 and receive antennas 1,..., M/2 become a # x £ K MIMO 
channel of the kind studied in Section 4.4. The same applies for irons 
antennas K/2+1,..., and receive antennas M/2 + 1,..., M. Under the 
same frequency-flatness, far-field assumptions, and angle definitions as in 
Section 4.4.1, the channel matrix H € C“** can be expressed using (4.91) 
as 


H= V3 vi- kamj(pr)akj(p) — VRant/2(Yr) aK o(¥t) 
e ton v1- Kam/2($r)aK/2(Yt) 


= VB)” ae (malian), (4.176) 


where ajy/2(v) is the array response vector defined in (4.49). We notice that H 
is the Kronecker product between the channel matrix in (4.170) for two dual- 
polarized antennas and an M x K MIMO channel matrix ayy/2(Yr)ax j2(Pt) 
with single-polarized antennas. Similarly to (4.172), the channel matrix in 


(4.176) can be factorized as 


H=|, M VM VE ly (4177) 


am/2(r) am/2(¢r) aso (¥t) akso(¥t) 
Vil Vit Dz Lr 


am/2(%r) am/2(%r) ax so(¥t) aK/o(t) 
o 
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Figure 4.53: Comparison of the capacities of MIMO channels with either 4 or 8 single-polarized 
antennas on each side, or 2 or 4 dual-polarized antennas on each side (which is also 4 or 8 
antennas). 


This is the SVD where the two non-zero singular values are 


sı = a iE K+ Vk), (4.178) 


S2 = ee vi k— Vk). (4.179) 


Comparing with the case M = K = 2 in (4.172), we notice that having 
multiple dual-polarized antennas at the transmitter and receiver results in a 
beamforming gain of MK/4. This is the product of the number of transmit 
and receive antennas of each polarization. However, the multiplexing gain 
remains limited to r = 2 and is created by the different antenna polarizations. 
In other words, the varying polarization does not help to resolve the issue 
that far-field MIMO LOS channels have low rank since there is only one 
path between the transmitter and receiver. The capacity is achieved by using 
MRT with the precoding vector aj (Y¢) /VK for each polarization at the 


transmitter side and MRC with the combining vector am/2(r)/VM for each 
polarization at the receiver side. The polarization multiplexing is achieved by 
using the same or different signs in front of these vectors. 

Figure 4.53 compares the capacities achieved with single-polarized and 
dual-polarized antennas with « = 0. There are curves for M = K = 8 and 
M = K = 4. We previously observed in Figure 4.50 that the single-polarized 
setup has a benefit at low SNR because a higher beamforming gain is achievable 
when all antennas have matching polarization. This property remains when 
the number of antennas increases, but it only occurs at lower SNRs. When 
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the SNR is greater than 0dB in Figure 4.53, the improved multiplexing gain 
that dual-polarized antennas offer almost always compensates for the reduced 
beamforming gain. Hence, antenna arrays and dual polarization are a good 
combination when designing point-to-point MIMO LOS systems. 


Example 4.26. Consider the MIMO channel with dual-polarized ULAs, whose 
channel matrix is given in (4.176). Compute the channel capacity with « = 0 
(best-case XPD) and « = 0.5 (worst-case XPD). For which values of SNR = we 
is x = 0 giving the largest capacity? 

The two non-zero singular values are given in (4.178) and (4.179). If x = 0, 
we get 8; = S2 = YBM K /2 and then the water-filling power allocation gives 
qı = q2 = q/2. The resulting channel capacity is 


q8MK 
8 No 


MK 
Orela ~ 2 1025 (1 + ) = 2log (1 + SNR) : (4.180) 


If k = 0.5, we instead have sı = \/8MK/2 and s2 = 0. Hence, only a 
single subchannel is activated (qı = q), which leads to the channel capacity 


MK MK 
Caual,n=0.5 = loga | 1 + ab = loga ( 1 + —,—SNR }. (4.181) 
2No 2 


This worst-case XPD scenario gives a smaller multiplexing gain but a larger 
beamforming gain. Yet, the beamforming gain is only half that achieved by 
the single-polarized MIMO channel in (4.96), so dual-polarized antennas are 
not desirable for pure beamforming. 

We can identify the SNR range where Caual,«=0 > Caual,x—0.5 as follows: 


MK 


2 logs (1 4 


MK 2 MK 16 
m a > pe > MIY 
> (1 ; SNR) M SNR => SINS Tr (4.182) 


The more antennas are used, the larger the SNR range where the setup with 
K = 0 outperforms the setup with «K = 0.5. 
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4.7 Exercises 


Exercise 4.1. Consider a ULA with M antennas that receive a signal from a single- 
antenna transmitter located at a distance dı in the angular direction y. A general 
relationship between the distances dı and dm is given in (4.13) for spherical waves, 
while the corresponding expression for plane waves is provided in (4.17). Show that we 
can obtain (4.17) as an approximation of (4.13) when dı > MA. Hint: Use the Taylor 
approximation /1+ z? 21+ z that is tight for 0 < x < 0.25, which was previously 
considered in Section 1.1.2. 


Exercise 4.2. Consider a SIMO system with isotropic antennas operating over a far-field 
LOS channel. The transmit power is P, the bandwidth is B, and there are M receive 
antennas. 


(a) State the channel capacity expression as a function of M, P, B, the wavelength A, 
the propagation distance d, and the noise power spectral density No. 


(b) Suppose M = 1, P = 1 W, B = 10 MHz, \ = 10cm, and No = 107" W/Hz. At 
what distance d is the channel capacity 10 Mbit/s? 


(c) We want to increase the number of antennas to achieve a channel capacity of 
100 Mbit/s at the same distance as in (b). How many antennas are needed if 
the other parameters are unchanged? How large is the total effective area of the 
antennas in the receiver array? 


(d) We now reduce the wavelength to A = 1 cm. How many antennas are needed to 
achieve a channel capacity of 100 Mbit/s at the same distance as in (b)? How 
large is the total effective area of the antennas in the receiver array? 


Exercise 4.3. Consider a ULA with M = 10 antennas and half-wavelength antenna 
spacing. Suppose it beamforms in the broadside direction in a free-space LOS scenario. 


(a) What is the beamforming gain obtained in the direction y = 0? 
(b) What is the beamforming gain obtained in the direction y = 7/6? 


Exercise 4.4. Reproduce the exact curve in Figure 4.13 but for a ULA with cosine an- 
tennas. Use the simulation results to discuss what happens to the half-power beamwidth, 
the amplification beamwidth, and the first-null beamwidth compared to having isotropic 
antennas. Which of these becomes smaller, wider, or unchanged in this example? 


Exercise 4.5. Consider a MISO channel with a ULA having M antennas and A = 4/2. 
The ULA transmits a signal in the end-fire direction Ypeam = 7/2. 


(a) If the receiver is located in another angular direction y, what is the beamforming 
gain? 


(b) Compute an approximate expression for the first-null beamwidth using the Taylor 
approximation arcsin(x) ~ x, which is very tight if M > 5. Hint: Consider angular 
directions close to +7/2 by setting y = +7/2+ x and looking for small x. Use 
that sin(+7/2 + £) = +(1 — 2 sin? (x/2)). 


(c) Compare the result with the beamwidth 4/M in (4.62) for broadside beamforming. 
Is the beamwidth smaller when transmitting in the broadside or end-fire direction? 
Does the beamwidth for end-fire beamforming decrease when M increases? 
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Exercise 4.6. A base station with M = 4 antennas transmits to a single-antenna device 
located at an angle y. Suppose the channel vector is 


1 
eit sin(y) 


h= VB e 2™ sin(y) | * (4.183) 


e i387 sin(y) 


(a) What is the capacity of this channel when P{/(BNo) = 10? Explain how the 
capacity value depends on g. 


(b) Suppose the base station believes the user is located at ~beam = 0° and transmits 
using MRT. If the true angle of the user is y = 60°, what is the achievable data 
rate? Compare the result with (a). 


(c) Repeat (b) but with y = 30°. Explain the result. 


Exercise 4.7. It is possible to create other grids of orthogonal angular beams than the 
DFT beams defined in Section 4.3.3. Suppose we construct M beams using the angles 


ee) 


4.184 
uM (4.184) 


y = arcsin ( 


for some 0 < a < 1 and for the integers n satisfying H Ssn a § (there are 


M such integers). Show that these beams are mutually orthogonal when using a ULA 
with A = \/2. 


Exercise 4.8. An M-antenna ULA receives the signal from p = —7/6 and uses MRC. 
An interfering signal arrives from the angle Yinterr = —7/9. 


(a) Obtain the sinc approximation of the beamforming gain in (4.31) for y = —r /6 
and Yintert = —1/9 by using that sin(x?) ~ x? for arguments close to zero. 


(b) Use the obtained sinc-expression from (a) to determine how many antennas 
are needed to ensure that the interfering transmitter is outside the half-power 
beamwidth if A = \/2. 


(c) Repeat (b) for A = X. 


Exercise 4.9. Consider a ULA with M antennas deployed to transmit beams toward 
user devices located in angular directions between 30° and 60°. When doing so, grating 
lobes are allowed if they do not appear in the angular interval y € [10°, 80°]. How should 
the antenna spacing be selected to achieve the smallest beamwidth? 


Exercise 4.10. Consider a MIMO LOS channel where the transmitter is equipped with a 
ULA with K = 4 antennas and antenna spacing A = 4/2. The receiver is equipped with 
M = 4 distributed antennas deployed along the arc of a circle with radius d (similar 
to Figure 4.26). The antennas are located in the angular directions pı = 0, ye = 7/6, 
3 = 7/2, and p4 = —7/6 as seen from the transmitter. Compute the channel capacity 
in terms of q, 8, and No. Hint: Express the channel matrix using array response vectors 
and show that these are mutually orthogonal. 


Exercise 4.11. Consider an array with three isotropic antennas deployed at the corners 
of an equilateral triangle with the side length A. The antennas are placed in the yz-plane. 
Compute an expression for the array response vector a(y, 0). 
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Exercise 4.12. Consider a UPA transmitter with My = 10 horizontal antennas along 
the y-axis and My = 4 vertical antennas along the z-axis. Each antenna has the 
cosine gain function given in (4.151). The single-antenna receiver is in the direction 


(2,0) = (7/6, 7/6). 
(a) What is the joint antenna and beamforming gain achieved by mechanical beam- 
forming? 
(b) What is the joint antenna and beamforming gain achieved by electrical beamform- 
ing? 
(c) Will the results in (a) and (b) change if the UPA instead has My = 5 horizontal 
antennas and My = 8 vertical antennas? 


Exercise 4.13. Suppose we are building a MISO system that will operate under a 
maximum EIRP limit of 68dBm. We use cosine antennas and let P denote the total 
transmit power. The power consumption of the system is measured as 


P 
gag +M- 1+ Perens W, (4.185) 
where the first term models signal transmission with a power amplifier efficiency of 
25%, the second term models that the transceiver hardware connected to each antenna 
consumes 1 W, and the fixed term Pcircuit > 0 models the remaining power consumption. 
Which combination of P and M will reach the EIRP limit while minimizing the power 
consumption in (4.185)? 


Exercise 4.14. Consider a point-to-point LOS channel. Suppose the wavelength is 
A= 0.1m, the transmit power is P = 10 W, the bandwidth is B = 100 MHz, and the 
noise power spectral density is No = 10717 W/Hz. 


(a) If M = K = 1 isotropic antennas are used, what is the capacity when the 
propagation distance is d = 100 m? 


(b) If M = 1 isotropic antenna is used at the receiver, how many isotropic antennas, K, 
are needed at the transmitter to reach the same data rate in (a) at the propagation 
distance d = 400 m? 


c) Assumin, e transmitter has K isotropic antennas, where K is obtained from 

A ing the t itter has K isotropi t here K is obtained fi 
(b), how many isotropic antennas are needed at the receiver, M, to reach the same 
data rate in (a) at the propagation distance d = 800 m? 


(d) Is it possible to reach the same data rate as in (a) at the propagation distance 
d = 800m by using a smaller total number of antennas M + K than in (c)? What 
are M and K in that case? 


Exercise 4.15. A UPA transmits a signal in the direction (Ypeam; beam), Where Ypream € 
[-1/2, 7/2] is the azimuth angle and Oycam € [—7/2, 7/2] is the elevation angle, using 
MRT. 


(a) Compute the first-null beamwidths in the horizontal plane (i.e., 9 = Abcam) and 
vertical plane (i.e., Y = Ybeam). 

(b) Suppose My = 10, My = 4, and A, = 1/2. If the UPA beamforms in the direction 
Pbeam = 0, Obeam = 7/10, what is the first-null beamwidth in the horizontal plane 
(0 = 7/10) and vertical plane (p = 0)? Compare the beamwidths with those 
achieved for Ypeam = 0, beam = 0 in Example 4.20. 
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Exercise 4.16. Consider a MIMO setup with two parallel M-antenna ULAs that are 
separated by a distance d. The transmitter and receiver have the antenna spacings A 
and A/2, respectively. If the first antenna in each array are aligned, the distance between 
transmit antenna k and receive antenna m becomes 


2 4 _ KA)? à _ KA)? 
dma = P+ (mà kA) =d\/14 (m3 ) va (14 Ce 


d 2d? 
(4.186) 


Use this approximation and that the channel gain is the same between every transmit 
and receive antenna pair. 


(a) Find an antenna spacing A that makes all the singular values of the channel 
matrix equal. Hint: Follow the approach in Section 4.4.3 but with different antenna 
spacings at the transmitter and receiver. 


(b) Consider the nulls of the beam pattern in (4.60). What is the physical distance 
between the nulls at the distance d from the transmitter? Compare it to A in (a). 


Exercise 4.17. Consider a free-space point-to-point MIMO channel. The antennas at the 
transmitter and receiver are K = 8 and M = 16, respectively. The angles-of-arrival and 
angles-of-departure are the same and given as yr = yt = 7/3 and 6; = & = 0. Suppose 
that q = 1078 W/Hz, No = 10-1" W/Hz, and à = 0.1m. 


(a) At what propagation distance d is the channel capacity C = 7bit/symbol if 
single-polarized isotropic antennas are used? 


(b) Suppose single-polarized cosine antennas with the antenna gain function in (4.151) 
are used at both the transmitter and receiver. At what distance d is the channel 
capacity the same as in (a)? 


(c) At what distance d is the same channel capacity as in (a) achieved when the 
transmitter and receiver use dual-polarized cosine antennas with the antenna gain 
function in (4.151) and « = 0 (i.e., best-case XPD)? 


Exercise 4.18. A packet of symbols is transmitted over the SIMO channel in (4.32). 
Suppose the transmitter sends the constant ,/q symbol Lp times, as explained in 
Section 4.2.5, so that the ULA receiver can estimate the deterministic but unknown 
channel h = /f8a(y). Due to hardware impairments, a deterministic but unknown 
phase-shift is introduced on the transmitted symbols. Hence, the received signals are 
yÍ] = h/ge !? + nll], for l =1,..., Lp, where ¢ € [—7z,7) represents the phase-shift. 
The power q of the transmitted symbols is known at the receiver. Find the ML estimates 
of y, 6, and ¢. 


Exercise 4.19. A packet of symbols is transmitted over the SIMO channel in (4.32). 
Suppose the transmitter sends the constant ,/q symbol Lp times so that the receiver 
can estimate the deterministic but unknown channel h = \/Ba(y, 0) using the received 
signals y[/] = h\/q+ nfl], for] =1,..., Lp. 


(a) Suppose the receiver has a ULA with M = 2 and A = 4/2. The array response 


vector is obtained from (4.120) as a(y, 0) = (1, ei" nero) T. Can the receiver 
uniquely find the ML estimates of y and 6? 


(b) Can we find an M > 2 so the receiver can uniquely find the ML estimates of p 
and 6? 


Chapter 5 


Non-Line-of-Sight Point-to-Point MIMO Channels 


The previous chapter considered free-space LOS channels, where the transmit- 
ted signal only reaches the receiver through a direct, unobstructed path. This 
chapter considers an entirely different setup: There is no LOS path (some 
object blocks it), but many reflected paths. We will first show that these 
multipath channels behave randomly and, thus, can be modeled statistically. 
This is known as a fading channel since the current SNR depends on the 
current random realization of the channel. We will then extend the statistical 
model to MIMO channels and obtain what is known as independent and 
identically distributed (i.i.d.) Rayleigh fading. Depending on how quickly the 
channels vary over time, we will consider different ways of extending the 
capacity concept to handle fading channels. The benefit of using multiple 
antennas to combat fading variations will be demonstrated and the spatial 
fading correlation created by the propagation environment’s geometry will be 
studied. 


5.1 Basics of Multipath Propagation and Rayleigh Fading 


We begin by considering a non-LOS (NLOS) SISO channel where there are L 
different paths that the signal can travel between the transmitter and receiver. 
This is called a multipath propagation channel. The paths are created when 
the electromagnetic wave interacts with various objects in the propagation 
environment, as illustrated in Figure 5.1. In this figure, the LOS path is 
blocked, but one can draw an unobstructed line between the transmitter and 
each object, and between each such object and the receiver. 

The interaction between the wave and the object depends on the shape of 
the object. Figure 5.2 showcases four main categories of interactions. Specular 
reflection refers to the case when the signal wave bounces off the surface 
in a mirror-like way; that is, the incident and outgoing angles are the same 
but on the opposite side of the normal to the object. This type of reflection 
occurs when the object is large and smooth (as compared to the wavelength). 
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Line-of-sight 


à is blocked 
Figure 5.1: An NLOS SISO channel with L propagation paths, where d; denotes the total length 


of the ith path. Each path is generated through interaction with an object in the environment. 
Figure 5.2 illustrates different types of interactions. 


dL 


Incident wave 


Refraction 


Diffraction 


Specular reflection Scattering 


(Diffuse reflection) 
Transmission 


Figure 5.2: An electromagnetic wave can interact with an object in the propagation environment 
in various ways. The interaction might change both the direction of the signal and its shape, 
which can become more diffuse. 


Scattering is a phenomenon that occurs when the signal wave impinges on a 
rough surface. When the signal bounces off the surface, it will be spread out 
in many different directions, also known as diffuse reflection. Compared to 
specular reflection, the benefit of scattering is the greater chance that one of 
the outgoing wave components propagates toward the receiver. The downside 
is that each component carries relatively little signal energy since the total 
energy of the impinging wave is spread between many directions. Transmission 
refers to when the signal passes through the object, often resulting in a slight 
shift in the propagation direction due to refraction inside the object. The 
object’s material determines what fractions of the signal energy are reflected, 
absorbed, and transmitted to the other side. Finally, diffraction refers to 
the phenomenon that electromagnetic waves can bend around sharp corners, 
resulting in the signal spreading diffusely on the other side. Diffraction also 
happens when the signal passes through holes in an object with a smaller size 
than the wavelength. 

In the context of communications, the important thing is the existence 
of multiple propagation paths, while the type of interaction with the objects 
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is secondary. We considered a multipath channel with L paths already in 
Section 2.3.3 when deriving the memoryless channel model we used in previous 
chapters. We will now recall some main results and introduce new notation to 
study the channel properties further. We denote the total length of the ith 
propagation path by d; meters. Hence, the signal that is received through the 
ith path will be time-delayed by 7; = d;/c = d;/(f-A) seconds, where fe is 
the carrier frequency, is the wavelength, and c is the speed of light. We let 
d= oe di denote the average path length. The average path length in 
wireless communications is usually much larger than the variations |d; — d| 
around the average. This is because mainly objects surrounding the transmitter 
or receiver will create propagation paths that reach the receiver. Hence, the 
receiver can sample the received signal using the delay n = d/c = d/(f,X), and 
we will obtain a memoryless channel if B(d—d;)/c ~ 0 for i = 1,..., L. Recall 
from Section 2.3.4 that this is known as the narrowband assumption since 
it is always satisfied when the bandwidth is sufficiently small. For example, 
suppose we interpret B(d — d;)/c ~ 0 as requiring that Bld — d;|/c < 0.1 
for all paths.’ If the maximum deviation from the average path length d is 
max; |d—d;| = 30 m, then we will obtain a narrowband channel for B < 1 MHz. 
If max; |d — d;| = 3m, then B < 10 MHz will result in a narrowband channel. 

We let a; € [0,1] denote attenuation of the ith path, while a? is the gain. 
We stress that a? should not be computed using the LOS formula in (1.7) 
since it only applies to direct free-space LOS paths. A path generated through 
specular reflection might have a gain that resembles that formula, but only 
when the reflecting surface is huge compared to the wavelength.” Diffusely 
reflected/scattered paths generally have a much lower channel gain due to 
the additional spatial dispersion created by the scattering and the material’s 
absorption losses. We will not assume a specific model in this chapter. 

By utilizing the narrowband assumption, we previously showed in (2.131) 
that the channel response h € C when having L paths can be written as 


$ L 
h = 5 aje 2 felin) — ` aye 2 (5.1) 
i=l i=l 


where the second equality follows from 7; = d;/(f.) and our assumption of 
n = d/(fcX). The value of this channel response depends on the distances 
and attenuations of the L individual paths. Since the attenuations a; are 


(dj —4) 


multiplied by the phase-shift terms e~?" =~, it is hard to tell whether the 


1The upper bound depends on which pulse function p(t) is utilized in the PAM because this 
determines for which time-shifts the intersymbol interference can be neglected. In this case, we 
selected 0.1 based on the fact that sinc(+0.1) ~ sinc(0) = 1 and sinc(/ + 0.1) ~ sinc(l) = 0 for 
l = +1, 2,... If we would use a pulse that varies more slowly than the sinc function, then we 
might expand the delay spread that we can manage without having intersymbol interference. 

2Many objects that behave as specularly reflecting mirrors for visible light are too small 
to behave in that way in wireless communications because the wavelength might be a 100000 
times larger (compare 4GHz communication with visible light that starts at 400 THz). 


312 Non-Line-of-Sight Point-to-Point MIMO Channels 


terms in the sum will reinforce or cancel each other. This depends on whether 
the phases happen to be aligned or not. When the transmitter or receiver 
moves, d,,...,dz will change. Even if the movement is only over a distance 
proportional to the wavelength A, this might substantially change all the 
phase-shifts in (5.1) and thereby change if the terms reinforce or cancel each 
other. This phenomenon is called multipath fading and motivates why small 
movements can give rise to seemingly random changes in the channel response. 


Example 5.1. Consider an environment with L = 2 objects creating propaga- 
tion paths with identical phases. What is the shortest distance the receiver 
can move to risk that the paths have phases that differ by m instead? 


Suppose for simplicity that e—/?* G _ eir? _ 1 at the initial 
point. When the receiver moves, the distances dı and də will change, and the 
phases will rotate. If the receiver moves a distance 6 > 0, the path lengths 
d,,dz can at most increase or decrease by 6, depending on the direction of 
motion compared to the direction that the signal components arrive from. 
The largest phase difference between the two paths occurs when the receiver 
moves right towards object 2 so that də shrinks to dz — ô, while simultaneously 
moving away from object 1 so that dı increases to dı + ô (or the other way 
around). The respective phase-shifts will then become 


Poem dec ad 
% j2r£ ut ) 


= eiri, (5.2) 


o (Goin) ees 
Oe ise ome ie (5.3) 


The difference between these phases is An’, which becomes ~ if 6 = A/4. 
Hence, whenever the receiver moves a quarter of the wavelength, there is a 
risk that the multipath propagation will change so radically that the paths 
are canceling out instead of reinforcing one another. 


5.1.1 Rich Multipath Propagation: Rayleigh Fading 


When the number of paths (L) is huge, we have a scenario known as rich 
multipath propagation. This is often a valid assumption in NLOS communica- 
tions due to the many paths created by scattering. We will derive a statistical 
distribution for the channel response h in this case. Since there is a large 
set of path attenuations a; and distances d;, it makes sense to model their 
values statistically. We assume that aj,...,a@z, are independent realizations 
of a random variable A, which describes how the channel attenuations vary 
between different objects in the environment. 

We further use Yp; = 2r id) + 27k; to denote the phase-shift of the 
ith path in (5.1), where the integer k; is selected so that y; € [—7,7), for 
i= 1,..., L. We can wrap any phase-shift into the interval [—7, 7) without 
loss of generality since e~/” is a periodic function with period 27. When there 
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are many paths, it is likely that the path difference |d; — d| ranges from zero up 
to many wavelengths, which implies that y; will likely be uniformly distributed 
between —7z and m. Hence, we assume 71,...,~z, are independent realizations 
of a random variable with a continuous uniform distribution between —7 and 
m. This is denoted as w; ~ U[—a,7) and the corresponding PDF is 


+, if -t<v<n, 
0, otherwise. 


foy) = (5.4) 


The purpose of this statistical modeling is to emphasize that (5.1) can be 
viewed as the summation of L independent realizations drawn from the same 
random distribution. We can separate the sum into two parts: 


L L L 
h= 5 aze ivi = 5 Qi cos(y;) -j 5 Qi sin(y;) . (5.5) 
i=l {=l i=l 


Real part Imaginary part 


The real and imaginary parts have zero means because the integral over a 
period of a cosine/sine function is zero. These parts are uncorrelated since 


L 


L L 
E 3 a; cos(Y) z aj sin(h;) p = E {aj } E{cos(Y:) sin(4:)} =0 (5.6) 


i=1 0 

when Y; ~ U|—r,7).” One can also show that the real and imaginary parts 
in (5.5) have the same variance since E{cos?(w;)} = E{sin?(w;)} = 1/2.4 

When L is large, we can utilize the central limit theorem, stated in 
Lemma 2.6, to obtain an approximate random distribution of h. This re- 
sult manifests that the sum of many independent and identically distributed 
real-valued random variables becomes approximately Gaussian distributed. 
We can apply this theorem to the real and imaginary parts of h in (5.5) to 
motivate that both are approximately Gaussian distributed. Since we have 
shown that the real and imaginary parts are also uncorrelated, it follows 
from the Gaussian distribution that they are also approximately independent. 
Hence, the channel response in a rich multipath environment is approximately 
complex Gaussian distributed: 


h ~ Ne(0, 8). (5.7) 


We let 8 denote the average channel gain E{|h|?} = 6 of h to obtain a 
notation where the average SNR is denoted in the same way as in LOS 


3There is a trigonometric identity saying that cos(7;)sin(y;) = sin(2~;)/2. For pi ~ 
U[-7, 7), it follows that E{cos(y;) sin(w;)} = E{sin(2y;)}/2 is an integral over two periods of 
the sine function; thus, it is equal to zero. 

4There is another trigonometric identity saying that cos?(~;) = 1/2 + cos(2w;)/2. For 
wi ~ Ul-1, T), it follows that E{cos?(w;)} = 1/2. Similarly, using the trigonometric identity 
sin? (pi) = 1/2 — cos(27;)/2, it follows that E{sin?(;)} = 1/2. 
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communications. Note that 8 is also the variance of h since the mean value 
is zero. As explained in Section 2.2.2, the full name of the distribution in 
(5.7) is the circularly symmetric complex Gaussian distribution, where the 
circular symmetry means that h and he~/” have the same distribution for any 
phase-shift wy. This property can be observed in Figure 2.6, where the PDF 
remains the same if it is rotated around the origin. 

The type of channel distribution in (5.7) is commonly known as Rayleigh 
fading. The reason is that the channel magnitude |h| is Rayleigh distributed, 
as previously described in Section 2.2.5. Specifically, |h| ~ Rayleigh( y 8/2), 
resulting in the PDF 


20 _ =? 

Fin (x) = a 2, forgo. (5.8) 
This PDF is illustrated in Figure 5.3(a) for 6 = 1. We observe that |h| has 
most of its probability mass between 0 and 3. The mean value can be shown to 
be \/7/2 ~ 0.9. The PDF does not look particularly strange, but an important 
characteristic is emphasized in Figure 5.3(b), where we show the same PDF 
using a logarithmic scale on the horizontal axis. We can then notice that 
most channel realizations will give |A| ~ 1, but there is also a substantial risk 
of getting a value closer to zero. For example, |h| < 0.5 happens in 22% of 
the realizations and |h| < 107} happens in 1% of the realizations. When the 

magnitude of the channel is this small, we say that it is in deep fade. 
Relatively few propagation paths are sufficient to approximately observe 
Rayleigh fading, especially if the paths have roughly the same channel gains 
a?. This will be the case when the scattering is close to the transmitter and/or 


receiver, so the path lengths d,,...,dz are roughly the same. The convergence 
to Rayleigh fading is illustrated in Figure 5.4 by showing the CDF 
Fin) =f fru (5.9) 


of |h| with L = 2, L = 5, and Rayleigh fading that is obtained as L > oo. 
Recall that the CDF of the Rayleigh distribution was given in (2.102). We 
have assumed a; = 1/ VL and  ~ U [—7,7). The curves with Rayleigh 
fading and L = 5 are nearly the same, while L = 2 gives a different shape. 
Hence, it is sufficient with five propagation paths with equal attenuation and 
random phases to obtain a channel that can be modeled by Rayleigh fading. 

Random channels are generally called fading channels since the SNR ane 
varies depending on the realization of h. The cause of the variations is the 
summation of the many complex exponentials in (5.1), which cancel out each 
other by having very different phases when the channel is in a deep fade. This 
is essentially an extension to Example 5.1, which showcased how two paths 
are canceled when the phase-shifts differ by m. As we will see later in this 
chapter, the fading variations are problematic in wireless communications and 
require us to define the channel capacity differently. 
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(a) Probability density function using a linear scale on the horizontal axis. 
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(b) Probability density function using a log-scale on the horizontal axis. 
Figure 5.3: The probability density function Qee-*” of x = |h|, when h ~ Ne(0,1). This 


channel distribution is known as Rayleigh fading and is characterized by occasional deep fades 
where |h| is much smaller than the average value. 
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Although the distribution of the magnitude |h| has given rise to the term 
“Rayleigh fading,” mostly the distribution of the squared magnitude |h|? is 
useful when analyzing the communication over Rayleigh fading channels. We 
will utilize it later in this chapter. It was shown in Section 2.2.5 that it has 
an exponential distribution: |h|? ~ Exp(1/8) with the PDF 


fin2(e) = se *, forz>0. (5.10) 


Wile 


Example 5.2. The derivation of Rayleigh fading distribution assumes that 
there are L independent and identically distributed propagation paths. What 
happens to the distribution if there is also an LOS path? 

The distinguishing property of the LOS path is that its gain is much 
stronger than that of the NLOS paths, so it cannot be included when applying 
the central limit theorem. If we denote the LOS channel gain as a? and 
phase-shift as Wo ~ U[—7, 7), then the channel response can be expressed as 


L 

h = age 1” + D aje i”: (5.11) 
=l 

—> apei% + Autos as L—> o, (5.12) 


where hntos ~ Nc(0, ntos) is Rayleigh fading created by the NLOS paths. 
This channel model is called Rician fading since |h| ~ Rice(ao, \/8NLos/2) 
has a Rician distribution.” When using this alternative model, it is common 
to let 8 = E{|h|?} = a2 + Bnros denote the average gain of the entire channel 
and define the so-called «-factor determining how the gain is divided between 
the LOS and NLOS paths: 


2 
eae (5.13) 
BNLOS 
Using this notation, we can generate random channel realizations as 
K : 1 
m= g yoe P ——Nc(0, 8), 5.14 
A a Ui) (5.14) 


where the phase of the LOS path and the Rayleigh fading created by the 
NLOS paths are the two sources of randomness. Using this notation, we also 
have that |h| ~ Rice(,/B«/(K + 1), /B/(2(k + 1))). 


_ 
"The Rician distribution x ~ Rice(v, øo) has the PDF f(x) = fe 20? Io (25), where 


2n 
ig) = ae oo is the zeroth-order modified Bessel function of the first kind. 


Under the Rician fading model, the PDF of |h| can have a shape that 
differs substantially from Rayleigh fading. Figure 5.5 shows the PDF with 
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Figure 5.4: The CDF Pr{|h| < x} of the channel magnitude |h| with Rayleigh fading and with 
L=2or L = 5 paths with constant gain and uniformly distributed phases. It is sufficient to 
have five paths to approximately observe Rayleigh fading. 
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Figure 5.5: The probability density function of |h| for Rician fading with 8 = 1 and different 
values of the «x-factor. 


8 = 1 and three values of the «x-factor. Rayleigh fading is given by « = 0 while 
k = 1 represents the scenario when the strength of the LOS path is identical 
to the average combined strength of all the NLOS paths. The existence of 
the LOS path shifts the probability mass slightly towards \/8 = 1, but the 
difference from Rayleigh fading is not so large, and deep fades still occur. In 
contrast, « = 10 results in a PDF more confined around 1, so small and large 
realizations are much less likely than under Rayleigh fading. 
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The remainder of this chapter considers Rayleigh fading since this is the 
more problematic scenario. The corresponding PDF expression is also tractable 
for performance analysis and developing methods that counteract fading. 


5.1.2 Independent Rayleigh Fading in SIMO and MISO Channels 


The Rayleigh fading channel model will now be extended to systems with 
multiple antennas. Building on the results in the previous section, we can 
expect that the channel between one transmit antenna and one receive antenna 
can be modeled by the complex Gaussian distribution in (5.7) under rich 
multipath conditions. Hence, every entry hm, of the MIMO channel matrix 
H can be modeled this way. The remaining question is how these channel 
coefficients are related to each other. Will hi and hm, be statistically 
independent or correlated? Will they have different variances? These questions 
will be answered in this section for the considered frequency-flat channel. 
We begin by considering a SIMO channel where a ULA with M antennas 
and antenna spacing A receives a signal from a single-antenna transmitter. We 
assume the ULA receives signal components via L objects in different angular 
directions in the three-dimensional world. We will use the spherical coordinate 
system in Figure 1.9 and assume the ULA is located along the z-axis, as in 
Example 4.16. The reason is that the corresponding array response in (4.122) 
is independent of the azimuth angle, which will simplify the presentation in 
this section. We let a; € [0,1] denote the attenuation of the ith path, 7; is the 
non-zero phase-shift® at the reference antenna, and 6; is the angle-of-arrival in 
the elevation domain. The channel response h € C™ can then be written as 


1 
‘ Asin(6; 
gjar Arale) 


24A sin(6;) 
A 


L 
k= qe] eP hy (5.15) 
i=l : 


e-j2r VPE 

where the summation resembles the SISO channel in (5.5) but the ith term 
is multiplied by the array response vector from (4.122) for a signal arriving 
from 6;. This setup is illustrated in Figure 5.6. We assume 7j,...,Wz are 
independent realizations of a uniform distribution between 0 and 27. We 
further assume ay,...,@z are independent realizations of a random variable 
and denote the average channel gain as 


p =J E {aj}. (5.16) 


i=l 


5In free-space LOS communications, there is only one path so we can synchronize the receiver 
such that there is no phase-shift at the reference antenna. This cannot be done when there are 
multiple paths, which is why 7; is needed in this case. 
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Figure 5.6: A ULA in an isotropic rich multipath environment where the multipath components 
are received from random elevation angle directions with uniform distribution. 


In an isotropic rich multipath environment, the number of multipath com- 
ponents L is large, and their locations are uniformly/isotropically distributed 
over all directions. Recall that y € [—7,7) denotes the azimuth angle, while 
6 € [-1/2, 7/2] denotes the elevation angle in the spherical coordinate system. 
The PDF of a uniform distribution over a unit sphere is given by 

_ cos(6) 


fo,0(9,8) = -> —T7<Y<T, = 


where 4r is the surface area of the unit sphere and cos(@)000¢ is the area 
of a surface element in direction (y,0) that appears when integrating over a 
sphere using spherical coordinates as in (1.27). The channel in (5.15) does not 
depend on the azimuth angle y; there is a rotational invariance when using a 
ULA that can be observed in Figure 1.18 and Figure 1.20. Hence, we only 
need the marginal PDF 


(5.17) 


ra=] Basie =” 


T 
=- << 5.18 
rpe 2° 2 ( ) 


when characterizing the statistical channel properties in this section. The 


. 7 (m—1)Asin 0;) 
entry hm = eae aje Ii e—i2n X £ in h = [h1,...,hm]® has the mean 
L 5 i2 (m—1)A sin(6;) 
E{hm} = X E{o;} Efe} efe jrg \ =0, (5.19) 
i=l —_ 


=0 
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where Efe} = 0 follows from that the angles are uniformly distributed 
between 0 and 27. Furthermore, the variance is 


(m—-1)A sin(@;) 
A 


L 
Var{hm} = E {|hm|?} = E4 [X aje Me 


L 
=) E{aj} =8. (5.20) 


i=l 


If L is large, we can model this channel coefficient as 
hm ~ Nc(0, 8), (5.21) 


according to the central limit theorem in Lemma 2.6 (following the same 
procedure as in the last section). All the channel coefficients in h have the 
same marginal distribution, including mean and variance. The multipath 
environment creates a random process that determines the fading realizations 
at all spatial locations, and the channel coefficients are samples taken from 
that process at antenna locations. Hence, the coefficients are also jointly 
complex Gaussian distributed. It remains to determine if the coefficients are 
correlated. To this end, we consider two different channel coefficients Am and 
hy, for which m Æ n, and compute the correlation 


> aj „Vi cian Ome) 


(m—1)A sin(6;) ami ala ) 
x 


L 
E{hmh*} = E X axe ii ia 
i=1 


= STE {02} E {eae \ . (5.22) 


i=1 


where the last equality follows from that E{e?'el¥s} = Efe J} E{el¥s} = 0 
for i Æ j because the angles are independent and uniformly distributed 
between 0 and 27. We can use the PDF in (5.18) to compute the last mean 
value in (5.22): 


5 (n—m)A sin(6; T/2 _ (n-m)Asin(0; j 
E t m m = in( i) \ T J e27 nm m = sin( i) cos(6;) a0; 


T/2 2 
À e27 (noma Le j2r Boma 
- 2n(n —m)A j2 
Asin (one) 2(n—m)A 
= = sinc . (5.23) 
2r(n —m)A À 
By using this expression and (5.16), the correlation in (5.22) becomes 
2(n—m)A 
E{hmh*} = B sinc (Aeon) l (5.24) 
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This value is generally non-zero, meaning the channel coefficients are generally 
statistically correlated. This is known as spatial correlation since we measure 
the correlation between the channel coefficients observed at different spatial 
locations. However, we can identify specific antenna spacings that give uncor- 
related channels. Since (n — m) is an integer and the sinc function is zero for 
integer arguments (except for zero), the expression in (5.24) is zero if 2A/A 
is an integer. In particular, this happens for the antenna spacing A = 4/2, 
which is yet another reason why the half-wavelength spacing is popular when 
considering ULAs. Intuitively, having uncorrelated channel coefficients in the 
array is preferable because every receive antenna provides unique information. 
We will see later that it is an important property to combat the adverse effects 
of fading. Note that if hm and hn are uncorrelated, they are also independent 
since they are jointly complex Gaussian distributed. 

Spatial correlation is sometimes referred to as antenna correlation, but this 
is a misnomer since the antennas are physical objects, not random variables. 
It is channel coefficients observed at different antennas that can be correlated, 
both in time and space. It is the spatial correlation we focus on in this chapter. 


Example 5.3. Consider two antennas located at the Cartesian coordinates 
(11,41, 21) and (9, Y2, 22), respectively. What is the spatial correlation be- 
tween their channel coefficients in an isotropic rich multipath environment? 

There is a distance 6 = \/(a1 — x2)? + (yi — y2)? + (z1 — 22)? between the 
antennas. Since the multipath components are uniformly distributed over all 
directions, we can shift and rotate the Cartesian coordinate system without 
changing the statistical distribution. In particular, we can place the origin at 
the first antenna and point the z-axis in the direction of the second antenna. 
We then have a ULA along the z-axis with M = 2 antennas and antenna 
spacing 0. It then follows from (5.24) that the correlation between the channel 
coefficients hı and hə at the two antennas is 


E{hy hz} = Bsinc (>) (5.25) 


where ( denotes the common channel gain. The conclusion is that spatial 
correlation in an isotropic rich multipath environment only depends on the 
distance between the antennas, not their exact locations. 


The derivation of the spatial correlation in this section is based on the 
assumption of having a ULA deployed along the z-axis. Example 5.3 shows that 
we can rotate the coordinate system arbitrarily and get the same result. Only 
the antenna spacing matters when determining the correlation in an isotropic 
rich multipath environment. For example, a ULA deployed in the horizontal 
plane will also give rise to Rayleigh fading with the spatial correlation given 
by (5.24). The correlation is zero if the antennas are half-wavelength-spaced 
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(or an integer times that), but not otherwise. 

There is a connection between the preferable spacing between antennas 
and the classical sampling theorem in Lemma 2.8. The sampling theorem of 
complex time-domain signals says we can reconstruct a signal with bandwidth 
B (counting both positive and negative frequencies) from samples spaced apart 
by 1/B in time. As the signal bandwidth is typically distributed between — B /2 
and B/2, the samples are taken twice per period of the largest signal frequency 
+B/2. What we have observed in this section is that we should sample a 
wireless signal in space using an antenna spacing of A/2 apart. We recall from 
Section 2.8.3 that a signal with wavelength A has the spatial frequencies +1/A 
in the direction of propagation. In contrast, spatial frequencies in the range 
(—1/A,1/A) can be observed in other directions. In an isotropic rich multipath 
environment where signal components impinge on the ULA from all possible 
angular directions, the channel will contain all spatial frequencies from —1/2 
to 1/X (not only one frequency as in Chapter 4). We can say that the spatial 
bandwidth is 2/X, and the sampling theorem then recommends taking samples 
that are spatially separated by 4/2 (i.e., twice per period). This is why the 
ULA should have that antenna spacing. 

The conclusion from the analysis above is the following. If a ULA with M 
antennas and A = \/2 spacing receives a signal in a rich multipath environment 
(with scatterers being equally distributed over all angular directions), then the 
SIMO channel h = [hi,..., has|" contains independent entries that are equally 
distributed according to (5.21). We can write this distribution in vector form 
as 


h ~ Ne(0, BI). (5.26) 


This channel model is known as i.i.d. Rayleigh fading and can be utilized for 
SIMO and MISO channels. 


5.1.3 Independent Rayleigh Fading in MIMO Channels 


We can extend this result to MIMO channels where both the transmitter and 
receiver are equipped with ULAs with A = 4/2 as antenna spacing. If each 
array is surrounded by many isotropically distributed scatterers (according to 


the conditions above), then every entry hm,, of the channel matrix H € CMxK 
will be independent and identically distributed as 
hm,x ~ Ne(0, 8). (5.27) 


If the antennas are arranged in other ways, the entries of H will generally be 
correlated. For example, MIMO systems with UPAs always feature spatial 
correlation because one cannot achieve \/2-spacing along the many diagonals 
in the arrays. We will focus on i.i.d. fading in this chapter since this is 
practically achievable and analytically tractable. 
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Example 5.4. What is the rank of H in i.i.d. Rayleigh fading? 

The rank of a matrix is equal to the maximal number of linearly inde- 
pendent columns. The matrix H € C“’** has K column vectors from C™. 
We begin by considering the case when K < M, so there are fewer columns 
than rows. The probability that a collection of K < M randomly generated 
columns will happen to be linearly dependent is zero when the randomness is 
independent and originates from a continuous distribution. The formal proof 
builds on generating random realizations for all entries of H except the last 
one: hm,x. For the last column to be a linear combination of the previous 
ones, there will only be one or a few discrete values that hwm, can take (or 
there is no value at all). The probability of obtaining one of a few specific 
values from a continuous distribution is zero. 

If K > M, then the first M columns will be linearly independent with 
probability one, and the same holds for any subset of M columns from H. 
Hence, we will get a realization of H that has the maximum rank min(M, K) 
with probability one. This also means that a MIMO channel with i.i.d. Rayleigh 
fading can achieve the maximum multiplexing gain r = min( M, K). 


5.2 Slow and Fast Fading Versus the Channel Coherence Time 


When the channel capacity was analyzed in Section 3, it was assumed that 
the channels are fixed throughout the transmission and known at both the 
transmitter and receiver. These are reasonable assumptions for LOS channels 
but not necessarily for fading channels. Recall from Definition 2.7 that the 
capacity describes the number of bits per second that can be “communicated 
with arbitrarily low error probability as the number of symbols in the packet 
approaches infinity”. The second part of this sentence is crucial in the context 
of fading channels: When we send a long packet, how many random realizations 
of the fading channels will we observe in the meantime? 

The answer to this question depends on many factors, such as the packet 
size, the geometry of the propagation environment, and the mobility of the 
transmitter, receiver, and objects that interact with the waves. We will quantify 
the time a channel coefficient is approximately constant to shed light on this. 

The worst-case scenario for channel variations was identified in Example 5.1. 
A practical situation where the same thing occurs is illustrated in Figure 5.7. 
The transmitter can reach the receiver via two reflecting objects, although the 
LOS path is blocked. Initially, the two propagation paths are of equal length 
d (i.e., dj = d2 = d) and have the same attenuation a. The receiver then 
moves towards object 2 at a speed of v m/s without changing the attenuation 
(for simplicity). Hence, the two propagation distances can be expressed as 
functions of the time t as 


dı (t) =d+ vt, d2(t) =d— vt. (5.28) 
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Figure 5.7: An NLOS SISO channel with two propagation paths that are initially of equal 
length, but the receiver then moves towards object 2 at a speed of v m/s. 


The SISO channel coefficient in (5.1) also becomes a function of time: 
; dı (t)—d : do(t)—d ey ey 
a (e jar AOD jag ale D -m (cers m etn) 


= 2a cos (27) : (5.29) 


h(t) 


We can notice several things from this expression. Firstly, the two paths have 
aligned phases at t = 0 and cancel out when vt = /4 which happens at the 


time t = Te, where i 
Ty = te (5.30) 

This is called the channel coherence time since it represents the shortest 
time to move from constructive superposition to a deep fade. The expression 
in (5.30) is often used to approximate the time a channel response remains 
approximately constant. The coherence time is proportional to the wavelength 
and inversely proportional to the speed of motion. One can rightfully criticize 
whether the proportionality constant in (5.30) should be 1/4 because the 
channel in (5.29) will change drastically from h(0) = 2a to h(T.) = 0 in that 
time period. On the other hand, we considered a worst-case scenario that is 
unlikely to happen in practice, which is why it is a common rule-of-thumb 
[26, Sec. 2.1.4].° 

The second observation is that the phase-shifts in (5.29) due to mobility 
are et2"™X = e+2"fe which corresponds to shifting the instantaneous 
frequency of the received signal by +fe#. This is known as Doppler shifts. 


6One can find alternative definitions of the coherence time in other textbooks; for example, 
based on maintaining temporal correlation above 0.5 [61] or based on the sampling theorem [3]. 
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Example 5.5. Consider transmitting a data packet with a time duration of 
100 ms. The communication takes place in the 3 GHz band (i.e., A = 0.1m). 
What is the coherence time if v = 0.1m/s or v = 25m/s? 
The two speeds correspond to slow indoor mobility and driving on a 
highway, respectively. The coherence time in (5.30) becomes 
À 0.1 


= rel = 250 ms if v = 0.1 m/s, (5.31) 
a 
= if v = 25 m/s. (5.32) 


In the former case, the time duration of the packet is substantially smaller 
than the coherence time; thus, the channel will be approximately constant 
throughout the transmission. In the latter case, the coherence time is 100 
times shorter than the duration of the data packet; thus, the communication 
will be subject to roughly 100 different channel realizations. 


An additional perspective on the coherence time concept can be obtained 
by revisiting the rich multipath environment from Section 5.1.2. The channel 
response at a given location is then a realization of a complex Gaussian random 
variable: Nc(0, 3). Suppose the receiver starts at an arbitrary location at 
time 0 and then moves along a straight line at the speed v m/s. At the time t, 
it will be at a location 6 = vt meters away from the initial location. Suppose 
we let h(0) and h(t) denote the channel realizations at these locations. In that 
case, we basically have a ULA with antennas separated by 6, except that the 
receive antenna is not simultaneously at both locations. It then follows from 
(5.25) that the temporal channel correlation is 


E{h(0)h*(t)} = Bsinc (+) = Bsinc (=) (5.33) 


If we continue using T, = È from (5.30) as the channel coherence time 
definition, the correlation in (5.33) becomes Bsinc(2vuT./A) = Gsinc(1/2) ~ 
0.648. One way to interpret this correlation value is that 


R(T.) © 0.64 h(0) + V/1 — 0.642 Nc (0 (5.34) 


which is a linear combination of the old channel h(0) and a new independent 
realization of the complex Gaussian distribution. The coefficient ensures that 
h(T.)|?} = 8. We can expect the random channel fluctuations to be small 
within the coherence time. Beyond that time interval, the tenpora correlation 
reduces more rapidly and becomes (sinc(1) = 0 when t = È. 

Figure 5.8 shows random realizations of |h(t)|, as a function of the time 
t, that are generated based on the temporal correlation model in (5.33). 
We consider 6 = 1, A = 0.1m, and the same two speeds of motion as in 
Example 5.5: v = 0.1 m/s or v = 25m/s. We notice that channel magnitude 
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Figure 5.8: A sequence of random realizations of |h(t)| that was generated using the temporal 
correlation model in (5.33). The speed v determines how quickly the channel changes over time. 


is almost constant over the considered 100 ms time interval when the speed is 
low, while there are very rapid variations when the speed is high. 

In conclusion, depending on how quickly things are moving in the propaga- 
tion environment, a fading channel takes one random realization throughout 
the time interval required to send a data packet, or the channel magnitude 
oscillates rapidly. Even in the latter case, there is a channel coherence time 
within which the channel is approximately constant. Hence, we can treat a 
fading channel as being piecewise constant over short blocks of time and jumps 
between different random fading realizations across these blocks. Figure 5.9 
illustrates how a continuously time-varying channel can be approximated to 
be piecewise constant in time intervals that match the coherence time of the 
channel. If we further assume that the random realizations are independent 
across these blocks but originate from the same distribution, we obtain what 
is known as the block fading model. 


5.2.1 Definitions of Slow and Fast Fading 


The relation between the channel coherence time and the packet length deter- 
mines how many fading realizations will be observed during communication. 
When studying the impact of fading on the channel capacity, two canonical 
setups (or extreme cases) are normally considered: 


1. Slow fading: The channel takes only one random realization throughout 
the entire transmission. 


2. Fast fading: The channel takes a new independent random realization at 
every time instance. 
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Figure 5.9: The block fading model approximates a continuously time-varying channel as 
being piecewise constant over time. The channel response changes every Te second, based on 
the channel coherence time, and takes independent and identically distributed realizations. 


We will study these cases separately in the remainder of this chapter. In 
both cases, the receiver is assumed to know the channel realization, while the 
transmitter does not. The motivation for this assumption is that the receiver 
can learn the channel realization after/during the transmission by analyzing 
the received signal. By contrast, the transmitter must decide how to transmit 
in advance, and then the random fading realization is generated. 

One can also relate the slow and fast fading concepts to the latency 
requirements of the communication link; that is, the time delay from a bit 
is transmitted until it must be decoded at the receiver. For a given channel 
coherence time, we can choose between transmitting a relatively short data 
packet only exposed to one fading realization (i.e., slow fading) or a very 
long data packet exposed to many fading realizations (i.e., fast fading). Since 
the receiver cannot finish the data decoding until the entire packet has been 
received, the former option will result in lower latency, while the latter option 
will result in higher latency. On the other hand, we will observe later in 
this chapter that the performance loss due to channel fading is lower under 
fast-fading conditions, so it is the preferred operating regime whenever latency 
is of little concern. 


5.3 Capacity Concept with Slow Fading 


In the slow-fading scenario, the channel responses are constant throughout 
the communication, but their values are generated as realizations of random 
variables. We consider the transmission of a packet containing sufficiently much 
data to use the channel capacity as the performance metric. We further assume 
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that the receiver knows the realization of the channel response, which we 
refer to as having perfect CSI. The transmitter can enable channel estimation 
at the receiver by transmitting a known preamble, following the procedure 
described in Section 4.2.4. Regarding channel knowledge at the transmitter, 
there are two possible modes of operation. 

In a closed-loop system, the receiver can feed back its channel estimate 
to the transmitter, which will then also have perfect CSI. The capacity of 
such a channel can be computed as described in Chapter 3, with the only 
addition that the channel coefficients are now drawn randomly from a specific 
distribution (e.g., iid. Rayleigh fading). 

This section considers open-loop systems, where the transmitter is unaware 
of the current channel realization but knows the statistical distribution. This 
situation especially appears in systems where a reverse feedback link does not 
exist (e.g., when broadcasting data to many unknown user devices) or when 
the feedback functionality is too slow to provide the transmitter with CSI 
(e.g., when the latency requirements are strict). The capacity results from 
Chapter 3 cannot be applied under these circumstances; thus, a new capacity 
concept will be developed in this section. 

To this end, we begin by returning to the memoryless SISO channel that 
was initially described in (2.130): 


yil] = h - [l] + nfl]. (5.35) 


We will mainly consider a Rayleigh fading channel where the channel response 
h is distributed as 
h ~ Nc(0, 8) (5.36) 


and takes only one realization throughout the communication. This might 
happen in practice when the transmitter and/or receiver are at random but 
fixed locations in a rich multipath environment. 
For a given realization h, we know from Corollary 2.1 that the (conditional) 
capacity is 
oes qlhi? 

n = logs (1 + vr) bit /symbol, (5.37) 
where the subscript h indicates that we have conditioned on the realization 
h. The receiver can decode a signal transmitted using any data rate R < Ch 
since it has perfect CSI. The critical challenge in slow fading is that the 
transmitter does not know the realization h, but only the statistics (i.e., 
Rayleigh fading with variance 8). Hence, the transmitter needs to select a 
data rate R bit/symbol, encode its data at that rate, and then hope that the 
communication will be successful so that the receiver can decode the data. 
The randomness can give rise to two different events: 


e If R< Ch, the transmitter has selected a data rate below the capacity. 
The communication will then be successful in the sense of achieving an 
arbitrarily low packet error probability. 
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Figure 5.10: In a slow-fading scenario, the fading realization h determines the supported 
channel capacity Cp. The PDF of Cp is shown in this figure for h ~ Nc(0,1) and q/No = 1. 
For a given R, the outage probability Pout(R) is the area under the curve for which Cp < R. 


e If R > Chn, the transmitter has selected a data rate above the capacity. 
The communication will then be unsuccessful in the sense of having a 
very high packet error probability. 


When the latter happens, the system is said to be in an outage. For a 
given rate R, we can define the outage probability: 


2 
Pyu(R) = Pr{R> Oy} =Pr [r > log, (1 + )} (5.38) 
0 
The outage probability is a strictly increasing function of R and Pou (0) = 0. 
Since the only rate guaranteed to provide zero packet error for any channel 
realization is R = 0, the channel capacity is strictly speaking equal to zero. 
We can nevertheless communicate over the channel, but the selection of 
R becomes a gamble. We can communicate relatively reliably by selecting 
a low R (resulting in a low outage probability), but then we will get little 
data through the channel. Alternatively, we can communicate unreliably by 
selecting a high R (resulting in a high outage probability). In this case, we 
can get a lot of data through the channel, but only on those few occasions 
when there is no outage. Figure 5.10 illustrates this situation by showing the 
PDF of the capacity Cn for h ~ Nc(0,1) and q/No = 1. The larger R is, the 
more probability mass will be under the curve between Cn = 0 and Ch = R. 
The outage probability equals this probability mass. 
We can compare the variations in Figure 5.10 with a non-fading/LOS 


channel having the same average SNR: {RE} = Mh = 1. Such a channel 
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would have a capacity of C = log,(1 + 1) = 1 bit/symbol. The figure shows 
that a fading channel can provide both larger and smaller values of Ch, which 
might give the impression that fading can be both positive and negative. 
Unfortunately, the adverse effect dominates in slow-fading scenarios since the 
transmitter does not know the value of Ch, so it must be very conservative 
when selecting R to avoid getting a large outage probability. 

The outage probability expression in (5.38) can be utilized along with any 
fading distribution. We will now compute the probability by exploiting the 
assumption that h ~ Nc(0,8), which implies that |h|? has an exponential 
distribution with its PDF f\,)2(2) given in (5.10). In particular, it follows that 


Pr {|h|? < z} = [ Finj2(t)ot =] eP, (5.39) 


By rearranging the expression in (5.38), we can obtain 


2 
Poye(B) = Pr ÍR > log, (14 ECN | 


No 
alh? 
= Pro? S1 
rt > a4 Se 
No(2® -1 Kolen =a) 
= Prf? < ZC) 1e a, (5.40) 
q 
If we denote the average SNR (similar to the case of non-fading channels) as 
„J alh?) _ a8 
NR=E} 4 SS Al 
sr =e {ST} É, (5.41) 


then the outage probability in (5.40) can be expressed as 


_ 2R_1 


Pout(R) = 1—e7 m., (5.42) 


This is a decreasing function of the SNR, which is logical since a higher SNR 
should make it easier for the channel to support a given rate R. We want 
to operate communication systems at relatively high SNRs to achieve high 
data rates. Hence, analyzing the scaling behavior of the outage probability 
in the high-SNR regime is essential. We can utilize the first-order Taylor 
approximation e`” ~ 1 — x for x = 0 to observe that 


= oe 
SNR SNR 


Pug BY el (1 (5.43) 
when the SNR is high. Hence, the outage probability is proportional to SNR! 
in the high-SNR regime and will go to zero as SNR — oo. We will show later 
that we can improve this high-SNR behavior by utilizing multiple antennas, 
but we will first provide an alternative way of formulating the outage situation. 
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Example 5.6. Suppose the channel response h has a fading distribution such 
that |h|? is uniformly distributed between 0 and 26. What is the average 
SNR? How does the outage probability depend on the SNR? 

The mean value of a uniform distribution with support in [0,26] equals 
the interval’s midpoint: 6. Hence, the average SNR is 


= er Be 
SNR = d i. (5.44) 


which is the same as in (5.41). The assumed channel distribution implies that 


oO 
Pr{|h|? <z}= 5 zE (0,26), (5.45) 
a > o 


By utilizing this property, we can calculate the outage probability in (5.38) as 


Pou(R) =Pr{ R> loge (1+ RE) |- pr fja < SEED) 


No 
Ro 
a ZNE RE [0,log,(1 + 2SNR)], (5.46) 
1 R > loga(1 +2 SNR). 


This expression is proportional to SNR~!, just as the outage probability in 
(5.43) with Rayleigh fading. However, the proportionality constant is only 
half as large, so there is a smaller risk of an outage when having a uniform 
fading distribution than with Rayleigh fading. 


5.3.1 e«-Outage Capacity 


Instead of specifying the desired rate R > 0 and computing the resulting 
outage probability Pout(R), we can specify a desired outage probability € > 0 
and compute the resulting maximum rate that can be supported over the 
channel. That rate is called the ¢-outage capacity and will be denoted as C,. 
It represents a capacity that can be achieved with probability 1 — e. 

In the considered Rayleigh fading SISO setup, we can set € = Pout(R) and 
solve for R to find Ce. By utilizing (5.42), we obtain 


_ 2R_1 _ 2R_1 
€=l—e SNR & l—-—e=e SNR 
Jti 
< = —]n(1 
SNR mite) 


s 2%—1=SNRIn((1-€)~') 
& R=log,(1+SNRIn((1—©)7')). (5.47) 
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Figure 5.11: The e-outage capacity Ce of a Rayleigh fading channel is compared with the 
capacity of an AWGN channel when the (average) SNR is SNR = OdB. The e-outage capacity is 
much smaller than the AWGN capacity in the practically interesting range of outage capacities 
(e.g., € < 0.1) but is higher than the AWGN capacity for e > 1—e7!. 


Hence, the e-outage capacity is 
C: = log, (1+ SNR In ((1 — €)~')) (5.48) 


and depends on the SNR and e. By computing the first-order derivative of Ce 
with respect to SNR and €, one can respectively show that Ce is an increasing 
function of the SNR and also an increasing function of the outage probability. 
Naturally, a higher SNR allows us to send more data. The reason that Ce 
increases with e€ is that we can then select a rate that is supported by the 
channel only when we get “good” fading realizations. 

It is instructive to compare Ce with the capacity C = log,(1 + SNR) 
of a non-fading AWGN channel having the same (average) SNR. The only 
difference is the additional term In((1 — €)~') that the SNR is multiplied by 
in (5.48). Interestingly, this term can be both smaller and larger than one. 
More precisely, it is smaller than one if e < 1 — e7! ~ 0.63, because this is the 
probability that |h|? is smaller than its average value E{|h|?} = 8. In these 
cases, the €-outage capacity is smaller than the AWGN capacity. The opposite 
is true for € > 1 — e~! because then the communication is only successful 
when the channel realization is stronger than its average. 

Figure 5.11 compares Ce and C = log,(1+ SNR) for SNR = OdB. The e- 
outage capacity is much smaller than the AWGN capacity for the vast majority 
of e-values. Typical desired values of the outage probability are e < 0.1, for 
which the c-outage capacity is less than 15% of the AWGN capacity. Hence, 
fading is generally considered a detrimental property of wireless channels. 
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Figure 5.12: The ratio between the c-outage capacity Ce in (5.48) and the AWGN capacity 
C = loga(1 + SNR) for different SNRs. The fading channel achieves a higher fraction of the 
AWGN capacity at high SNR, but the convergence in (5.49) is not visible. 


However, the figure also shows that for large values of e, the e-outage capacity 
is larger than the AWGN channel capacity. If reliability is unimportant, the 
fading can occasionally be exploited to achieve high rates. However, the 
transmitter needs to know when the channel has good realizations, which is 
inconsistent with the considered setup. 


The last figure considered a rather low SNR value. The difference between 
C: and C depends on the SNR. The fraction of the AWGN capacity that is 
achieved with a fading channel converges as 


Ce _ log, (1 + SNR In ((1 — €)~1)) 


1 : 
C logz(1 + SNR) i (Pa 


when SNR —> oo, where the limit can be established using L’Hospital’s rule. 
Hence, the relative difference vanishes asymptotically at high SNR. 


Figure 5.12 shows the fraction C: from (5.49) for a range of practical SNR 


values. Two practical outage probability values are considered: € = 0.1 and 
€ = 0.01. The figure shows that fading channels operate closer to the AWGN 
capacity at higher SNRs; however, the convergence to the upper limit in (5.49) 
is not apparent in the considered SNR range. An SNR. of hundreds of dB is 
necessary to approach the upper limit for these values of e. The conclusions 
are that channel fading has a detrimental impact on the capacity at practical 
SNRs and that the asymptotic result in (5.49) is not practically useful. 
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5.3.2 Receive Diversity in SIMO Systems 


The issue with slow fading channels is the substantial risk that the channel 
coefficient is in a deep fade; thus, we need to select a low rate value of R to 
keep the outage probability reasonably low. This problem can be mitigated by 
using multiple antennas that have been deployed to observe different fading 
realizations. In this section, we consider a SIMO system with i.i.d. Rayleigh 
fading. We will demonstrate that having multiple independent channel coef- 
ficients under slow fading is beneficial since there is a good chance that at 
least one of the M antennas experiences a decent channel realization. 

The SIMO channel h ~ Ne(0, SIm) is considered in this section. For a 
given channel realization, we can utilize (3.22) to obtain the (conditional) 


capacity value j 

Ch = logs (1 + LA) (5.50) 
No 

for a given realization of h. This expression does not depend on the individual 

entries of h but only on the squared norm ||h||?. The norm is only small when 

all the entries of h are simultaneously small. Under i.i.d. Rayleigh fading, this 

variable has the scaled .?(2M)-distribution introduced in Section 2.2.5. The 


PDF of ||h||? was stated in (2.99) as 


aM-le-# 
fiiny2(2) = BM(M = 1’ for x > 0. (5,51) 
We can define the outage probability when the transmitter uses the rate R as 
_ E qllb||? 
Pour(R) = Pr{R > Ch} = Pr 4 R > logs | 1 + WF 
0 
No (2% —1 
= Pr? ||h]|? < eS) | (5.52) 
q 
The exact outage probability can then be computed using (5.51) as 
No(2-1) z 
= aM-le-# r 
Pou = eae 
(R) 1 r le 
No(2"-1) ave ae No(2"-1) 
7 q _ | - M-27 2 3 
Brera, | TE 


No(2"-1) \"™ 
No (2%-1) M-1 aß 
= ]=g aß 5 i =] — e SNR 2 = 


by integrating by parts repeatedly and then using the SNR definition SNR = 
a from (5.41). This expression is complicated to analyze since there are 


5.3. Capacity Concept with Slow Fading 335 


many terms. We are primarily interested in the behavior at high SNRs since 
this is where we want the system to operate and where outages can be 
avoided through a good system design. In those cases, the outage probability 
is determined by the behavior of f\py2(x) for z ~ 0. By utilizing the fact that 
e*/8 < 1 for x > 0, we obtain the inequality 


gM-1 


finie (2) < M (5.54) 


We can expect to achieve equality approximately in (5.54) when x ~ 0. Hence, 
a tight upper bound on the outage probability in (5.52) can be computed as 
No(2%-1) 
elie 
Pout(R) = f finge (x) Ox 


No(2%-1) ae M 
a ae: _{NoQ®-1)\" 1 
J a plo? = ( a ) p (5.55) 


By using the SNR definition in (5.41), we can write (5.55) as 


eta 1 
) (5.56) 


Pout(R) < a Ml’ 
where the upper bound is proportional to SNR~™ and is approximately 
achieved when the SNR is high. Hence, the outage probability reduces with 
the SNR much more rapidly when multiple antennas exist. This is known as 
a spatial diversity gain and M is the diversity order. The more antennas are 
used, the less probable it is that all the antennas are simultaneously in deep 
fades; each independent channel coefficient contributes +1 to the diversity 
order. Moreover, the higher the SNR is, the deeper the fade must be to get 
an outage for a given value of R, and this becomes even less probable when 
there are multiple antennas. 

Figure 5.13 illustrates the diversity gain in a setup with iid. Rayleigh 
fading and R = 1bit/symbol. Figure 5.13(a) shows the outage probability 
for different SNRs with M = 1, M = 2, and M = 4 antennas. The outage 
probabilities are roughly the same at low SNRs, while the curves behave 
very differently at higher SNRs. We know that the outage probability is 
proportional to SNR~™ at high SNR. This means that for every 10dB that 
the SNR increases, the outage probability is reduced by a factor of 1/10”. 
Since logarithmic scales are used on both axes in Figure 5.13(a), this results 
in lines with the slope —M. As the SNR increases, a steeper slope leads to 
a rapidly lower outage probability. If we want to achieve Pout = 1073, then 
we need SNR = 30dB with M = 1, SNR = 13dB with M = 2, and only 
SNR = 4dB with M = 4. 


336 Non-Line-of-Sight Point-to-Point MIMO Channels 


10-4 fi fi A Slope: -2 Fma fi | 
-10 -5 0 5 10 15 20 25 30 
SNR [dB] 


(a) The exact outage probabilities with M = 1, M = 2, and M = 4. 
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(b) The exact outage probability and the upper bound in (5.56) for M = 2. 


Figure 5.13: The outage probability is proportional to SNR~™ at high SNRs when communicat- 
ing over a SIMO channel with i.i.d. Rayleigh fading. In this case, we consider R = 1 bit/symbol. 
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The origin of the performance gain lies in the behavior of ||h||?, and there 
are two contributing phenomena. Firstly, the average gain 
M 


E{|lh|?} = > Ef 


m=1 


hm|?} = MB (5.57) 


is proportional to M (see Section 2.2.5 for further details), which is how the 
beamforming gain is manifested in NLOS channels. The gain is the same as for 
LOS channels, except that it now varies around the mean value M depending 
on the channel realizations instead of always having that exact value. Secondly, 
the variations in ||h||? around its mean value reduce in relative terms (i.e., 
normalized by the average gain). The variance can be computed as 


i? yf} tei? PY S me 
vada ney = ‘lat h||?} | late 


iS 


_E{ihit}  (E{ hI} M+M _ 
M?p? |e {\fhl?}[? Me M’ 
(5.58) 


by using (5.57) and the following result from (2.97): E {||h||4} = (M? + M) 8?. 
We notice that the variance in (5.58) reduces with M. This is the statistical 
property that gives rise to the diversity gain. In general, the beamforming 
gain shifts the outage probability curves to the left in Figure 5.13(a) as we 
increase M, while the diversity gain makes the curves steeper. 

We will now continue describing the simulation example. Figure 5.13(b) 
compares the exact outage probability Pout(1) in (5.53) with M = 2 and the 
upper bound in the right-hand side of (5.56), for different SNR values. As 
previously claimed, the upper bound overlaps with the exact curve when the 
SNR is large. At high SNRs, we can thus increase the SNR by 10dB and 
expect the outage probability to reduce by a factor 1/10™% = 1/100 since 
M = 2. This is indicated in the figure. 

Recall that we want to achieve R = 1bit/symbol in this example. It 
is instructive to compare the fading channel with an LOS channel with 
M = 2 antennas and an SNR of —3dB because it has a matching capacity of 
1 bit/symbol. There are no outage issues for such a non-fading channel: we 
need the SNR to be at least —3 dB, and then we are guaranteed to achieve a 
data rate of 1 bit/symbol. The dotted vertical line in Figure 5.13(b) indicates 
this SNR level. If the iid. Rayleigh fading channel has the same SNR, the 
outage probability is approximately 0.6 (this is where the curves intersect), 
which is too high to get reliable communication. We must increase the SNR 
to achieve a reasonably low outage probability. This is the price to pay for 
reliability over fading channels. The price reduces when we add more receive 
antennas, thanks to the diversity gain, but there is always a need for operating 
at somewhat higher SNRs than in the corresponding non-fading channel. 
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Example 5.7. What is the c-outage capacity of the SIMO channel? 
The exact €-outage capacity C can be obtained by solving the equation 
Pout(R) = € for R. The outage probability in (5.53) can be expressed as 


Pout(R) = Fin? (xe) ; (5.59) 


where Fjnje(£) = 1—67? EY G) is the CDF of ||h||? with i.i.d. Rayleigh 
fading. By inverting the CDF, we can compute the outage capacity as 


N (28 — : No (2% — 
€= Fini (=e) S Fipple) = a 


q = 
log, (1 m Fate) =R. (5.60) 


Hence, the e-outage capacity becomes 


C: = logs (+$ + <—Finje(e 9) ; (5.61) 


Unfortunately, there is no simple expression for the inverse CDF, but the 
inverse exists since the CDF is a strictly increasing function. We can use the 
expression in (5.61) for any fading distribution, not only i.i.d. Rayleigh fading. 


5.3.3 Transmit Diversity in MISO Systems 


We now turn our attention to a MISO system. We know from Section 3.3 that 
the capacity of SIMO and MISO channels are the same, but that result was 
obtained assuming that both the transmitter and receiver know the channel 
vector h. Only the receiver knows the channel in the slow-fading scenario we 
consider in this section. Hence, the receiver could apply the optimal MRC 
vector W = TET to the received signal in the SIMO system in the previous 
ats By contrast, the transmitter cannot apply the optimal MRT vector 
p= JET in the corresponding MISO system since it does not know h. However, 
a way to achieve a diversity gain in MISO systems is to use a space-time block 
code (STBC). We will provide a few basic examples to introduce the main 
characteristics while we refer to [62] for a textbook dedicated to the topic. 

The received signal of a MISO system with linear precoding was given in 
(3.41) as y = h* pz + n. If the transmitter selects a fixed unit-norm precoding 
vector p that is independent of the channel h, then we obtain 


h*p ~ Nc (0, 8) (5.62) 
under i.i.d. Rayleigh fading with h ~ Nc(0, 8Im). This follows from the fact 
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that the weighted sum of independent Gaussian random variables is also 
Gaussian distributed, and from that E{|h™p|?} = p"E{h*h™}p = Bp"Iyp = 
B. The effective SISO channel h*p that we obtained has the same fading 
distribution as the SISO channel we analyzed earlier in this chapter; thus, 
there is no additional diversity. To achieve transmit diversity, we need a more 
intricate transmission scheme than precoding in a fixed direction. 

The technique for achieving the maximum transmit diversity with M = 2 
is known as the Alamouti code because it was first proposed by Alamouti in 
[36]. The main idea is to transmit the same set of two data symbols two times 
using different precoding. The precoding vectors are not selected based on 
the channel h = [hj, h2]" but in a clever way that works for any realization of 
the channel and yet enables the receiver to separate the data symbols. 

We consider two consecutive transmissions over the MISO channel in (3.35) 
with time indices l = 1 and l = 2: 


y[1] = So Am2m[1] + nfl], (5.63) 
y[2] = y hm£m[2] + n[2], (5.64) 


where y[}] is the received signal at time l, £m[l] is the transmitted signal from 
the mth antenna, and n[I] is the noise. We can write this entire system in 
matrix form as 


pol- ea Sal) (a) + Ge aa 
—— 


The data symbols should be embedded into the matrix X, where each column 
represents the signals transmitted at a specific antenna (the space dimension), 
and each row contains the signals transmitted simultaneously (the time 
dimension). We want to send the two data symbols Z[1] and Z[2] over the two 
considered time instances. Ideally, we would send one after the other using 
MRT with X = [2z[1], 2(2]]" fa but this requires channel knowledge at the 
transmitter. Alamouti proposed a way to achieve a similar result without 
having to know the channel by embedding z[1] and z[2] into X as 


oe | a aa (5.66) 


The scaling factor 1/ v2 in front of the matrix ensures that the transmit power 
at each time instance equals the average power of the data symbols. This 
signal matrix does not depend on the channel realization h. Each column 
in X represents what one of the antennas transmits over two different time 
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Figure 5.14: The Alamouti space-time encoding takes a sequence of two data symbols z[1], z[2] 
and transmits them over two antennas according to (5.66). 


instances. In the first instance, the first antenna sends the first symbol, and 
the second one sends the second symbol. Next, the antennas transmit the 
opposite symbols with complex conjugates and a minus sign on one of the 
antennas. The operation of taking a block of data symbols and mapping them 
to the antennas over a time block is called space-time encoding. Figure 5.14 
illustrates the encoding operation for the Alamouti code. 

The pattern of how the data symbols are mapped to the antennas in X is 
carefully designed so that (5.65) can be written as 


y = Xh+n= S Ee 3 A isl i 


= 33 [ur pieri +” m 
If we take the conjugate of the second row in (5.67), we obtain 
al ee | hız[1] + h2z[2] | bes 
= V2 [hia [2] + h3z[1] eae 
zah al lee] + m 


=H 


By comparing (5.68) with (3.56), we notice that it has the same form as a 
2x2 MIMO system with the channel matrix H. In fact, the Alamouti code has 
been selected so that this matrix has orthogonal columns. This implies that 
the receiver observes the two signals in two different orthogonal dimensions 
of the vector space so that the signals can be distinguished without mutual 


interference. The SVD H = USV" of the channel matrix in (5.68) has the 
simple form 


in 

a= L [a ha) lve 0 | [1 0 

a-ge Hl mli i (5.69) 
a i? oa 


= —VH 
=> =v 
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Figure 5.15: A block diagram of the transmission and reception using the Alamouti code. The 
transmitter maps two data symbols into a transmission block X, as detailed in Figure 5.14. The 
receiver conjugates the second received signal to obtain y in (5.68). It then acts as a MIMO 
receiver that uses the left singular vectors from (5.69) to decouple the two transmitted symbols. 


where both singular values are equal to ||h||/v2. If we multiply y in (5.68) 
with U¥ from the left, we decouple the reception into two parallel channels: 
=u _ lhl Ee | ic 

U*ry = = |L +U*"n, 5.70 
where U"n ~ Nc(0, NoI2). Since the signal values are equally large, it is 
optimal to allocate the transmit power equally between Z[1] and Z[2]: 


Ee ~ Ne(0, qlz). (5.71) 
The transmitter sends these signals without utilizing the channel coefficients 
since V in (5.70) is an identity matrix. The block diagram shown in Figure 5.15 
summarizes the space-time encoding and decoding. Two data symbols are 
mapped to X, which is then sent over the channel over two different time 
instances to obtain y = Xh + n. At the receiver, the second entry of y is 
conjugated to obtain y, which is then multiplied by U” that originates from 
the SVD in (5.69). This decouples the transmission into two parallel channels, 
each having the channel coefficient ||h||//2 and independent additive noise 
with variance No. The data can be encoded and decoded separately over these 
channels; thus, no non-linear processing is required. 

Note that each symbol is transmitted with the power q since the total 
power over two channel instances is 2q, and it should be equally distributed 
over Z[{1] and Z[2]. Hence, for a given channel realization h, it follows from 
(3.75) that the (conditional) capacity over the two time instances is 


h 2 
Ch = 2 log, (1 + TT) bit per two symbols. (5.72) 
0 


Since we are used to expressing the capacity in bit/symbol, it is more conve- 
nient to rewrite (5.72) as 


qlib]? 


Ch = logs (1 + LC) bit /symbol. (5.73) 
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If we compare (5.73) with the SIMO case in (5.50), we notice that the 
only difference is that we only get half the SNR in the MISO case. This is 
because the signals are transmitted isotropically instead of being beamformed 
towards the receiver; thus, the beamforming gain is lost. If we consider 
iid. Rayleigh fading with h ~ Nc(0, G12), then the average SNR in (5.73) 


becomes E{ alalt} = q3/No, which does not depend on the number of antennas. 
This is the way to verify that there is no beamforming gain. 

Although the transmitter can construct X without knowing the channel, 
it cannot compute Ch, so it does not know how much data to encode into 
z[1], z[2]. If the transmitter selects the rate R, the outage probability with 
iid. Rayleigh fading can be computed as 


q||h||? 


TER (A) E 


= ]— e °°} SNR 
m! 


m=0 


by following the same integration-by-parts approach as in (5.53). The average 
SNR is still defined as SNR = 4, but only half of this value is achieved when 
communicating in this way. Moreover, the compact upper bound 


Pal(R) < (= (22 — a) 1 1 (e 2) en 


q8 2} 2\ SNR 


can be obtained by following the same steps as in (5.54)-(5.56). Recall from 
Section 5.3.2 that this bound is tight at high SNRs. We can see in (5.75) that 
the outage probability reduces as SNR~”. This means the Alamouti code 
achieves a diversity gain of order M = 2, the same maximum diversity order 
as in the SIMO case with the matching number of antennas. 

Figure 5.16 shows the outage probability for a channel with i.i.d. Rayleigh 
fading and the desired rate R = 1 bit/symbol. We compare a SISO system 
(M = 1) with the receive diversity obtained by a SIMO system (M = 2) and 
the transmit diversity obtained by a MISO system (M = 2) using the Alamouti 
code. The diversity gains are clearly visible: The outage probabilities decay 
as SNR~™.. This demonstrates that the diversity orders obtained by transmit 
and receive diversity are the same. However, there is a 3dB gap between the 
curves. This is because receive diversity also gives rise to a beamforming gain 
that doubles the SNR, corresponding to a 3dB improvement. This can be 
observed mathematically by comparing (5.75) with (5.56) for M = 2, where 
the only difference is that the SNR is divided by two in the MISO case. 
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Figure 5.16: The outage probability of MISO and SIMO systems with M = 2 and i.i.d. Rayleigh 
fading is compared with the corresponding SISO system. The rate is R = 1 bit/symbol. The 
MISO and SIMO systems achieve the same diversity order, but the MISO system has a 3 dB 
worse SNR since it cannot obtain a beamforming gain. 


Example 5.8. Show that XX" is a scaled identity matrix when using the 
Alamouti code. How is E{tr(XX*)} related to the transmit power? 
We can compute the matrix product using (5.66), which leads to 


lf z z(2]] [zu] —2 1 
Oe ne | a | BS at = 5 (I? + |22)?)b. (6.76) 
This is a scaled identity matrix, which implies that the rows of X are orthog- 
onal. All STBCs that satisfy this condition are called orthogonal and share 
the property that the receiver can separate the transmitted signals without 
interference, as was the case for the Alamouti code. The scaling factor in (5.76) 
ensures that tr(XX") = |z[1]|? +|z[2]|?, which implies that E{tr(XX")} = 2q 
so that the total power of one block equals the power q per symbol times the 
length of the block. 


The Alamouti code is designed for transmitting two data symbols over 
M = 2 antennas, but there are orthogonal STBCs crafted for larger numbers 
of transmit antennas and more symbols. The code design is nontrivial and 
has attracted much research attention over the past decades, starting with 
[37]. There are three main design parameters: i) the number of transmit 
antennas, ii) the number of time instances the code is transmitted over, and 
iii) how many data symbols are embedded. It is also essential to ensure that 
the symbols are assigned an equal fraction of the transmit power and obtain 
the maximum diversity order. When having M > 2 transmit antennas, only 
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codes that use more time instances than there are symbols exist, and it can 
be proved that the fraction cannot surpass 3/4 [62]. The Alamouti code is 
the only code with as many symbols as there are time instances.’ 

We will conclude this section by describing an orthogonal code from [63] 
designed for M = 4 antennas, which we will refer to as the Ganesan code 
since Ganesan and Stoica proposed it. The code transmits the 3 data symbols 
z[1], z[2], z[3] over 4 time instances, leading to the coding rate ny = 3/4. The 
code matrix is 


(5.77) 


and has the property XX" = 3 (|z[1]|? + |z[2]|? + |z[3]|?) I4 that is expected 
from Example 5.8. The scaling factor in front of the matrix ensures that the 
transmit power at each time instance equals the average power of the data 
symbols. Each row of X has the same norm; thus, the transmit power is 
divided equally over time. We also notice that each symbol is transmitted 
from each antenna, which is a prerequisite for achieving maximum diversity. 
The received signal over the four different time instances can be expressed as 


y(1] n[1] 

yl2] n[2] 
= Xh + 5 5.78 
u(3) nf (5-78) 

y(4] nfa] 

—— —— 

=y =n 
where h = [hy, k2, ha, ha]™ is the channel response, yfl] is the received signal 
at time l, and n{l] is receiver noise at time l, for l = 1,...,4. Since the data 


symbols appear in (5.77) both with and without complex conjugates, it is 
convenient for decoding purposes to extend the system model in (5.78) also 
to include the conjugates of the received signals: 


hı hg —h, 0 0 0 


ho 0 0 0 h4 hg z[1] 

0 0 —ħə h3 —hı 0 z[2] 
2) = [28] +[2]-3 0 h 0 hy 0 mf atl], 
y*|  |X*h* n*| v3 0 0 0 i hà hi z*[1] 
S == om he he o alep 

h3 —hj 0 0 0 -h3 z*[3 


hy 0 ht o 0 | 


=H 
(5.79) 


There exist non-orthogonal STBCs that contain as many data symbols as there are time slots, 
but these require more complicated receiver processing to deal with the resulting interference; 
we refer to [62] for details. 
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If we compare (5.79) with (3.56), we notice that it has the same form as an 
8 x 6 MIMO system with the channel matrix H. The critical difference is 
that the last three symbols in X are complex conjugates of the first three 
symbols, so they carry no extra information. All the columns of H are mutually 
orthogonal and have norms equal to ||h||/\/3, thanks to how the code matrix 


in (5.77) was designed. This implies that H"H = Mal Te. Hence, if we multiply 
y with tao" (which has unit-norm rows) from the left, we obtain 


V3 pny — Wall zla] + 


I ar H"n. (5.80) 


We notice that the data symbols [1], Z[2],z[3] are received separately in 


the first three entries of MH and exhibit a common channel coefficient of 


|||] //3. It can also be shown that the first three entries of the noise term 


in (5.80) are independent and have variance No, thanks to the normalization 


factor nay and that we never used a noise variable and its complex conjugate 
in the same expression. Since the channel gain is the same for all three symbols, 


it is optimal for the transmitter to allocate the power equally between them: 


z[1] 
712 ~ Nc(0, qls). (5.81) 


It follows that E{tr(XX*)} = $ (E{|z[1]?+ |212]? +|z[8]?}tr(14) = 4q, which 
is the power q times the length of the block. For a given channel realization 
h, it follows from (3.75) that the (conditional) capacity is 


Ch = = ibe (1 + nr) bit per symbol. (5.82) 
4 3No 
The word “symbol” in the unit refers to the transmitted symbols, not the 
data symbols. Since we transmit a block of four symbols to transfer three 
data symbols, the coding rate n, = 3/4 appears as a pre-log factor in (5.82). 
Compared to the (conditional) capacity in the SIMO case in (5.50), we notice 
that the Ganesan code only gets a third of the SNR. There are two contributing 
factors: the lack of beamforming reduces the SNR by 1/M = 1/4 while 
spreading three signals over four time slots increases the SNR by 1/n, = 4/3. 
The combined effect is 1/(Mn,) = 1/3. 
The main reason to use an STBC is to achieve the maximum diversity 
order the channel can offer. The outage probability for a given rate R and 
iid. Rayleigh fading can be computed as 
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3 qlib]? 
=P -l 1 
Poul R) T {R > 4 O82 ( + 3No 


a(ate/e_7)\™ 
a(248/3-1) 3 | —“SNR 
=l—e ~ SNR (5.83) 
m! 
m=0 
by following the approach in (5.53). To enable comparison with the SISO case, 
the average SNR is still defined as SNR = 4, although only a third of it is 


achieved. An upper bound that is tight at high SNRs is obtained as 


4R/3 _ $ 
Pout(R) < ( ( 3) z (5.84) 


SNR 4! 


by following the same steps as in (5.54)-(5.56). We can see in (5.84) that the 
outage probability reduces as SNR~*, which implies that the diversity order 
is M = 4, which is the same as in the SIMO case with four antennas. 


Example 5.9. What diversity order is achieved by a repetition scheme where 
the same signal x ~ Nc(0,q) is transmitted sequentially from M antennas 
over M time instances, using only one antenna at a time? 

The received signal with this repetition scheme can be expressed as 


y[1] hı n[1] 
s sl a lael ee (5.85) 
y[M] hm n[M] 
Ea aa aA 


if we transmit the signal from antenna m at time instance m to obtain the 
received signal y[m]. The system model in (5.85) has the same form as the 
SIMO system in (3.14). Hence, the (conditional) capacity is obtained from 
(3.22) as Ch = 7 loga(1+ aI, where the pre-log factor 1/M represents that 
the same symbol is repeated over M time instances. Assuming i.i.d. Rayleigh 
fading, we can follow (5.54)-(5.56) to upper bound the outage probability as 


M 
1 q|\h||? paR | 1 

= = il- HEC | es man by 

Pa a = Pr T > — logs ( No S| ae i (5.86) 


The diversity order is M since the outage probability reduces as SNR“ ; 
which is the maximum value. The drawback is the inefficient use of time 
resources; transmitting one symbol over M time instances leads to a coding 
rate of only 1/M. For this reason, the outage probability is proportional to 
(2MR — 1)™ instead of (2? — 1)™ as in the SIMO case in (5.56). 


5.3. Capacity Concept with Slow Fading 347 


10° - "a, l : 
F "e, — SISO: M=1 
X - --STBC: M =2 
e Ex , ---- STBC: M=4 
L AB NS oe Repetition: M = 4 


N ~ 
2102 . 
a \ 
` 
10-3 b y N 
\ 5 XN 
\ y N 
\ Lear 
` s 
1074 1 J f 1 ‘ f C 1. f 
-10 -5 0 5 10 15 20 25 30 
SNR [dB] 


Figure 5.17: The outage probability of MISO systems with M = 2 or M = 4 antennas 
and i.i.d. Rayleigh fading is compared with the corresponding SISO system. The rate is 
R = 1bit/symbol. The STBCs achieve diversity orders equal to the number of antennas 
and outperform the repetition scheme for the same number of antennas. 


Figure 5.17 shows how the outage probability varies with the SNR when 
the rate is R = 1 bit/symbol and there is i.i.d. Rayleigh fading. We compare 
a SISO system (M = 1) with three schemes that achieve transmit diversity 
over MISO channels: The Alamouti code in (5.66) with M = 2, the Ganesan 
code in (5.77) with M = 4, and the repetition scheme from Example 5.9 with 
M = 4. The SISO system achieves the lowest outage probability when the 
SNR is very low, while the benefit of diversity becomes evident at medium to 
high SNR. Although the Alamouti code has a coding rate of n, = 1 and the 
Ganesan code only has n, = 3/4, the latter achieves a larger diversity order, 
which leads to a lower outage probability for SNRs above 1dB. The repetition 
scheme’s inefficiency is evident from the wide performance gap to the Ganesan 
code that uses the same number of antennas. However, it outperforms the 
Alamouti code at high SNRs thanks to the larger diversity order. 

In conclusion, diversity is of utmost importance to achieve low outage 
probabilities, and it can be achieved at the transmitter side using STBCs. 


5.3.4 Joint Transmit and Receive Diversity in MIMO Systems 


When both the transmitter and receiver are equipped with multiple antennas, 
even better reliability against fading can be achieved through the simultaneous 
use of transmit and receive diversity. The MIMO channel matrix H € C’** 
contains M K entries and, under i.i.d. Rayleigh fading, it can provide a diversity 
order up to MK. To this end, we must design a transmission scheme where 
the channel coefficient becomes proportional to ||H||p, which is the Frobenius 
norm of the matrix defined as follows. 
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CMxk 


Definition 5.1. The Frobenius norm of the matrix H € is defined as 


M K 
>So 


m=lk=l 


IH]|r = (5.87) 


where Am, denotes the entry at the mth row in the kth column. 


The Frobenius norm is a natural matrix extension of the Euclidean vector 
norm ||h|| = 4/5% |hm|? since it also adds up the squared magnitudes of the 
entries. The subscript “F” is used in this book for clarity because alternative 
matrix norms are commonly used for matrix analysis. In this section, we will 
denote the kth column of H as hx, so that H = [hj,..., hx]. 

The Frobenius norm is closely related to the trace and singular values 


$1,...,8, of H because the following two properties hold: 
||H||> = tr (H"H) = > bll’, (5.88) 
mjg = So 22 (5.89) 
k=1 


Under i.i.d. Rayleigh fading with hm, ~ Nc(0, 8), H|} has the same 
distribution as the squared norm of a SIMO/MISO channel with MK antennas. 
Hence, the squared Frobenius norm has the scaled y?(2M K)-distribution that 
was introduced in Section 2.2.5, which has the PDF 


yMK-1e-8 


fig) = gazak pe fr * =o (5.90) 


A straightforward way to achieve the maximum diversity order is to utilize 
the repetition scheme from Example 5.9. In this case, the same signal is 
transmitted over the K transmit antennas over K time instances, using 
only one transmit antenna at a time, while all M receive antennas are used 
continuously. The received signal y[k] € C™ at time instance k can be 
expressed as 

ylk] = hz + nfk], k=1,...,K, (5.91) 


where x is the data symbol and n{k] ~ Nc(0, NoIm) is the independent 
receiver noise. As the same symbol is repeated, the complete received signal 
can be expressed as 


=]: a : |, (5.92) 
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which looks like a SIMO channel with an M K-dimensional channel vector con- 
taining all the columns of H. It then follows from (3.22) that the (conditional) 
capacity of this channel is 


o1 aali a ql|H||ġ 
Cu = z log, (13 F = lee Gaai (5.93) 


where the pre-log factor 1/K represents that the same data symbol is repeated 
over K time slots. The last expression follows from (5.88). If the transmitter 
selects the data rate R, then the outage probability can be computed as 


1 allH||g 
ii =P — l 1+ —— 
P. t(R) r T > K O89 ( + No 


MK-1 (25a) 


KR- 


= ]— e SNR SNR 


(5.94) 
foo m! 


by following the same approach as in (5.53) and defining the SNR as earlier. 
The diversity order becomes particularly visible in the upper bound 


9KR =| MK 1 
Pout(R) < (a) TENN (5.95) 


(MK)! 

that is obtained through the same steps as in (5.54)-(5.56). This expression 
shows that the outage probability reduces with an increasing SNR as SNR-“* 
at high SNRs, where the bound is tight. Hence, the diversity order is MK. 

The same diversity gain can be achieved using STBCs, which can be 
designed to outperform the repetition scheme for every given SNR value. The 
repetition scheme can be viewed as an inefficient STBC achieving maximum 
diversity but with an unnecessarily low coding rate of ny = 1/K. While each 
STBC is designed for a particular number of transmit antennas, they can 
be directly applied along with any number of receive antennas. For example, 
the Alamouti and Ganesan codes use K = 2 and K = 4 transmit antennas, 
respectively, while the number of receive antennas M can be arbitrary. 

In the MISO case, the received signal in (5.67) with the Alamouti code has 
the form y = Xh +n € C*. In the MIMO case, the received signal ym € C* 
at receive antenna m can instead be expressed as 


Ym = Xim +0m, m=1,...,M, (5.96) 


where nm ~ Nc(0, NoIx) is the receiver noise and h,, € C* denotes the 
mth row of the channel matrix (the arrow notation points out that rows are 
horizontal): 

) BE | 


H = | ; | (5.97) 
hi 
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By processing the received signal as described in Section 5.3.3, the counterpart 
to (5.70) for the mth receive antenna is the processed received signal 


abel Rees] 


At this stage, the receiver can use MRC to combine the signals over the M 
receive antennas, which will result in a summation of the channel gains: 


M > 2 M TES 
p lamli Em aml? Fle 


| 5 3 (5.99) 


Hence, we can reuse the outage probability expressions from earlier in this 
chapter but replace ||h||? with ||H||2. The diversity order with the Alamouti 
code increases from 2 to 2M, and the outage probability in (5.74) becomes 


= (E ) 
2(2R-1) 2M1 | “SNR 
Poul R) = pr{R > log, (1 4 are ee — ) y oe 2 


m! 


m=0 


(5.100) 


The same approach of combining the signals over the M received signals 
can be used along with the Ganesan code, in which case the diversity order 
increases from 4 to 4M, and the outage probability in (5.83) generalizes to 


H 2 
Pe pr{R> 3 bes (14 eile} 


3(248/3—1)\ 7” 
3(24R/3—1) 4M-1 | — SNR 
=l—e SNR 


2 mi (5.101) 
Example 5.10. What is the outage probability if the transmitter sends an 
independent data symbol with equal power from each antenna? 

If the transmitter sends x = [21,...,2K]" ~ Nc(0, 1x), the receiver 
can decode the data streams sequentially as described in Section 3.4.3 with 
P = Ix and Q = 41x. The achievable data rate for z; is stated in (3.115) 
as loga(1 + z4- 1 nCr th, ), where Ci+ı = Im + Don KN, Deh; and hy is 
the kth column of H. To reach the total data rate R, the transmitter can use 
the rate R/K for each stream. The resulting outage probability is 

eT ah a let oe (1 g uci) } G 
because an outage occurs if at least one of the K streams does not support the 
rate R/K. Since each signal is sent from one antenna to M receive antennas, 
the diversity order is M instead of MK. Hence, this transmission scheme 
sacrifices diversity to achieve the maximum multiplexing gain of min( M, K). 
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Figure 5.18: The outage probability of MIMO systems with M = K = 4 antennas and 
i.i.d. Rayleigh fading. The rate is R = 4 bit/symbol, and three schemes are compared. The STBC 
(Ganesan code) and repetition scheme achieve the maximum diversity order of 16; however, the 
former is more efficient thanks to the higher coding rate. Spatial multiplexing of four streams 


leads to lower outage probability at low and medium SNRs, but the reduced diversity order 
makes it less efficient than the STBC at higher SNRs. 


Figure 5.18 exemplifies how the outage probabilities vary with the SNR in a 
MIMO system with M = K = 4 antennas. The data rate is R = 4bit/symbol, 
which is selected to be larger than in previous examples since the system is now 
capable of spatial multiplexing. We compare the use of an STBC (Ganesan 
code) with spatial multiplexing of four parallel signals as in (5.102) and the 
repetition scheme in (5.94). Spatial multiplexing can more easily achieve large 
rates at moderate SNRs, which results in the lowest outage probability for 
SNRs below 10.3 dB. However, this scheme only achieves diversity order 4 
since each symbol is transmitted from a single antenna and received by M = 4 
antennas. The STBC gives a much lower outage probability at higher SNRs 

because it achieves the maximum diversity order of MK = 16. This is why 
the corresponding curve is extremely steep when it begins decaying. The 
repetition scheme also achieves the maximum diversity order. However, the 
curve is shifted by 27 dB to the right since it only transmits one symbol per 
four time instances, which is very inefficient compared to the Ganesan code. 
This concluding example is a reminder that there are two conflicting design 
goals in slow-fading scenarios: Achieving a high data rate R and maintaining a 
low outage probability Pout( R). In practice, the acceptable outage probability 
€ might be a given design parameter, and then the remaining goal is to achieve 
the e-outage capacity. In the MIMO case, the preferred choice between spatial 
multiplexing and STBCs, or something in between, depends on e. We refer 


to [26, Ch. 9] and [64] for a deeper theoretical framework for analyzing the 
diversity-multiplexing tradeoff. 
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5.4 Capacity Concept with Fast Fading 


We now shift the attention to fast fading, where many channel realizations 
occur during the signal transmission. We begin by revisiting the memoryless 
SISO channel in (2.130), where the received signal at the time instance | is 


yll] = All] - ci] + nfl], (5.103) 


while z|l] is the transmitted signal and n[l] ~ Nc(0, No) is noise. The im- 
portant new property in this section is that the channel coefficient h[l] is 
time-dependent. For notational convenience, we will first consider the scenario 
where the channel takes a new independent realization at every time instance. 
The fading is so fast that the channel varies at the symbol rate but is constant 
within each symbol transmission interval. Later in this chapter, we will extend 
the analysis to the case where the channel is constant over a finite block of 
time instances before a new independent realization occurs. 


The channel can be viewed as a continuous-time random process (see 
Section 2.2.7) from which we take samples at the symbol rate to obtain 
the sequence of channel realizations: h{1],h[2],.... We assume the random 
process has zero mean and is stationary, which implies that each sample 
has the same statistics. We further assume that h/l] is independent for each 
l and the variance is denoted by E{|h[l]|?} = 8. The receiver knows the 
channel realizations perfectly, while the transmitter only knows the channel 
statistics. The fast channel variations lead to massive diversity since many 
fading realizations are observed, which makes the capacity analysis much 
different from the slow-fading scenario; for example, we will demonstrate that 
outage-free communication can be achieved without CSI at the transmitter. 


The capacity of a non-fading channel is C = maxs (æ) (H(y) — H(y|2)) 
bit/symbol, as defined in (2.133). To understand its operational meaning, 
we need to consider the transmission of a packet containing L data symbols: 
x{1],..., [ZL]. If each symbol is encoded to represent C bits, then the error 
probability in the decoding at the receiver goes to zero as L — oo. 

The original capacity expression has no time index since the channel 
response was assumed constant throughout the communication. Hence, we 
need to derive a different expression that can be applied to a fast-fading 
channel. The starting point is to transmit a packet of length L with the 
received signal given by (5.103) for l = 1,..., L, which is affected by the 
independent fading realizations h[1],...,h[L]. We let fxiy,... xt (@[1],---, e[£]) 
denote the joint PDF of the L data symbols z[1],..., [LZ]. The capacity can 
then be generalized as [65, Ch. 4] 


C= lim max Rr, (5.104) 
Loo f,iy,....x(z](#[1],--.,2[Z]) 


5.4. Capacity Concept with Fast Fading 353 


where the average data rate over L time instances is 


Ry = =(H(y[l],--- ylZ|IAU,---, ALL) 
—H(y[l],...,y[L]|z[1],...,2[L], h{],..., aIL) (5.105) 


and the differential entropies are conditioned on the L channel realizations. 
Since the channel is memoryless and the channel realizations are independent, 
the differential entropies in (5.105) can be decoupled using the chain rule in 
(2.136) as 


A(yll,--- LEAI, ---, AIEN) < YO HONIR), (5-106) 


H(y[1],---,y{L]le[l],--.,e[Z], h1], . -AED = YE Hale, h), (5-107) 


l=1 


where the upper bound in (5.106) is achieved when the symbols 2[1],...,2[L] 
are designed to be independent so that the received signals y|] are con- 
ditionally independent (given h|l]) for l = 1,..., L. Since the capacity in 
(5.104) is achieved by maximizing Rz with respect to the symbol distribution 
Fi(aj,...x(z](@[1], -- -, £[L]), we should let the L signals be independent so the 
upper bound is achieved. By inserting these expressions into (5.104) and 
(5.105), we obtain 


C= lim > ar ir max (HUIA) — Huet A) 


Jim fob i (1+ ve. (5.108) 


In the last step, we utilized the capacity result in Corollary 2.1 to conclude 
that x[i] ~ Nc(0,q) is the optimal symbol distribution, which leads to an 
expression having the familiar log,(1 + SNR) structure. 


It remains to compute the limit in (5.108). We notice that this expression 
alti 


is the sample average of the data rate log,(1 + 4 ). Since the channel 
realizations are independent, the rate realizations ire also independent. The 
limit of the sample average of independent and identically distributed realiza- 
tions can be computed using Lemma 2.4, which is known as the law of large 
numbers. As long as the random data rates have finite variance, we can use 
this lemma to establish that 


L 
d qlh[i]|? alh]? 
= | = ———- J)=E r 
C jim 5 2 logs (1 + N; logy | 1 + —— N, , (5.109) 


0 
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where the sample average converges to the statistical mean. This is the mean 
of the conditional capacity C» in (5.37) with respect to the fading channel 
h, which is the only random variable in the expression. The expression in 
(5.109) holds for any practical fading distribution because the variance is 
always finite. However, we will mainly analyze the Rayleigh fading case 
where h ~ Nc(0, 3). We summarize the capacity result as follows. 


Corollary 5.1. Consider the discrete memoryless fast-fading channel with 
input z|] € C and output y|l] € C given by 


yll] = All] - x[i] + nll], (5.110) 


where n[l] ~ Nc(0, No) is independent noise. Suppose the input distribution 
is feasible whenever the symbol power satisfies E{|z[l]|?} < q. Furthermore, 
suppose the channel h[/] takes independent and identically distributed real- 
izations at every time instance l from a distribution with finite variance. If 
the channel realizations are known at the output, the channel capacity is 


e alh]? 
C =E 4 loga | 1 + ae bit /symbol (5.111) 
0 


and is achieved when z|l] ~ Nc(0,q) and independent for each l. 


The term ergodic is used in statistics to describe random processes for 
which the time average approaches the statistical mean. This property was 
used in (5.109) to obtain the capacity as the mean value of loga (1 + q|h|?/No). 
The proof was based on Lemma 2.4 (the law of large numbers) and utilized the 
assumption that the temporal channel realizations h[1], h[2],... are mutually 
independent. This is a sufficient but unnecessarily strong assumption. The 
same capacity expression is obtained when the channel realizations are samples 
from any ergodic random process, which generally features a weak temporal 
correlation that vanishes with time. For example, suppose the receiver moves 
at the speed v along a straight line in a rich multipath environment, and 
the data symbol / is transmitted at time t = 1/B, where B is the bandwidth. 
It follows from (5.33) that the temporal correlation between the channel 
coefficients h[1] and A[l] is 


E {h[1]h*[I]} = Bsinc (HS) (5.112) 


which goes to zero as | + oo. A weak law of large numbers can be established 
under these conditions [66, Ex. 254]. The truly necessary condition is that 


8The magnitude of the channel coefficient is upper bounded as |h| < 1 in practice because 
we cannot receive more power than was transmitted. This implies that the variance of h can 
also not be larger than 1. 
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all possible channel realizations are obtained over time, according to the 
underlying stationary statistical distribution, which ergodicity implies. 

The capacity in (5.111) is called the ergodic capacity for the aforementioned 
reasons and this term is used to distinguish it from the conventional capacity 
of non-fading channels. Since the ergodic capacity in (5.111) is a deterministic 
constant that can be computed by the transmitter using only statistical 
knowledge (i.e., the distributions of the channel and noise), the transmitter 
can deduce how to encode the data without knowing the channel realizations. 
Hence, unlike the slow-fading scenario, the transmitter does not need to guess 
which data rate the channel supports. By encoding the data based on the 
ergodic capacity value, we can achieve reliable (outage-free) communications 
as in the non-fading channels considered in previous chapters. 


Example 5.11. Compute the ergodic capacity for a Rayleigh fading channel, 
using the exponential integral function E(x) = fp e~*”/w dw. 

We consider the channel distribution h ~ Nc(0, 3). The squared magnitude 
of a complex Gaussian random variable is exponentially distributed (see 
Section 2.2.5), thus the ergodic capacity in (5.111) can be expressed as 


C = E {loga (1+ 2)}, (5.113) 


where z = sak ~ Exp(1/SNR) and the average SNR is defined as SNR = a 
By using this distribution, we can compute the mean value in (5.113) as 


< 1 z al oF 1 w 
C= i logs (1 =F z) SNR SNR Oz = e5NR | logs (w) SNR SNR Ow 
1 E2 1 w 1 1 
= log, (e)e 5NR | ma SNR Ow = log, (e)e 5NR Fy (sam) 5 (5.114) 


where the second equality follows from a variable change to w = 1 + z, and 
the third equality follows from an integration-by-parts approach (where some 
terms are omitted since they equal zero). The final expression utilizes the 
definition of the exponential integral function E(x). This is an established 
analytical function that is implemented in many software libraries. 

We can get further analytical insights by bounding the function as 5 In(1+ 
2/x) < e” Ey (x) < In(1 + 1/z) [67, §5.1.20], which implies that 


1 

5 logy (1 + 2SNR) < C < log, (1 + SNR). (5.115) 
This chain of inequalities shows that the ergodic capacity is smaller than 
the capacity log,(1 + SNR) of the corresponding non-fading channel, but the 


relative loss cannot surpass 1/2 due to the structure of the lower bound. 


The closed-form ergodic capacity expression for Rayleigh fading in (5.114) 
enables efficient numerical evaluation but is not amenable to analysis. We 
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will therefore continue the comparison between a Rayleigh fading channel 
h ~ Nc(0, 8) and the capacity log,(1+ 42) of the corresponding non-fading 
channel by starting from the mean value expression in (5.111). At low SNR, 
we can use the approximation in (3.2) to obtain that the capacity difference is 


, q\h|? qb qE{|h]? q8 
E fiog, (1 + Na — log, | 1 + N, N loete = w —] Bale) N =0. 
(5.116) 


Consequently, there is no capacity loss from having a fading channel when 
the SNR is low. The reason is that the capacity is then an approximately 
linear function of |h|?; the realizations where |h|? is below the average 8 are 
fully compensated by the realizations that are above the average. The fading 
sometimes makes the channel stronger and sometimes weaker, but it behaves 
like a non-fading channel on average. 

The situation is more troublesome at high SNR, where the capacity differ- 
ence (in bit/symbol) is 


E fiog, (1 + KREN) — logs (1 + £) 
h 2 2 
x e f1og, (a )} logs (+) = z (iog, (= )} ~ —0.83, (5.117) 


where the first approximation follows from (3.3) and the second approximation 
is obtained by computing the mean value numerically and presenting it with 
two significant digits. These results imply a negligible capacity loss in Rayleigh 
fading channels at low SNRs, while the loss approaches 0.83 bit /symbol at high 
SNRs. The reason for this loss is that the capacity grows slower and slower 
with the SNR since it is a logarithmic function of it. Hence, the realizations 
of |h|? that are below the average 8 incur a larger rate degradation than the 
realizations of |h|? that are above the average improve the rate. 


These differences are illustrated in Figure 5.19, where the ergodic capacity 
of a Rayleigh fading channel is compared with the capacity of a non-fading 
channel when SNR = 3 is the same. The performance difference is negligible 
when the SNR is below —10 dB, but then it begins to grow. The high-SNR 
gap of —0.83 bit/symbol is approximately achieved when the SNR is 30dB. 
In summary, fading reduces communication performance, particularly at high 
SNR, where we want communication systems to operate. Fortunately, this 
adverse effect can be mitigated using multiple antennas, as shown next. 


5.4.1 Ergodic Capacity of i.i.d. Rayleigh Fading SIMO Channels 


The ergodic capacity is the mean value of the data rate achieved for a single 
channel realization. For a SIMO channel with M antennas, we can therefore 
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Figure 5.19: The ergodic capacity of a Rayleigh fading SISO system for a varying SNR is 
compared with the corresponding capacity of a non-fading channel. There is no gap at low SNR, 
while the capacity loss of having a fading channel approaches 0.83 bit/symbol at high SNR. 


generalize the SISO ergodic capacity in (5.111) to the SIMO ergodic capacity 


Ox {to ( ar) l 
= E4 log, | 1 + N, bit /symbol. (5.118) 
This is the mean value of the conditional capacity Ch in (5.50). The expression 
in (5.118) holds for any channel distribution (with bounded variance). We 
consider an i.i.d. Rayleigh fading channel: h ~ Nc(0, Im). It is then possible 
to compute (5.118) exactly, following the approach in Example 5.11, but 
the expression is complicated and provides little insights.? We will instead 
compute lower and upper bounds on the ergodic capacity and compare them 
with the capacity log,(1 + MSNR) of the corresponding non-fading SIMO 
channel, for which ||h||? = 6M and we still use the definition SNR = ye . To 
this end, we will make use of a few mathematical properties. 


Definition 5.2. A twice-differentiable scalar function f(x) is said to be conver 
if LEG: ) > 0 for all x, while it is concave if 2a <0. 


The graph of a convex function is shaped like a cup, U, in the sense that 
the line segment between any two points on the graph is above the graph. By 
contrast, the graph of a concave function is shaped like a cap, N, and has the 
opposite property. The expectation of a convex or concave function behaves 
differently, as shown by the following result, called Jensen’s inequality. 


We refer to [1, Lemma B.15] for the complete result and derivation. 
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Lemma 5.1. Consider a scalar random variable x and scalar function f(x). 
If f(a) is a convex function, then 


f(E{x}) < E{f(x)}. (5.119) 
If f(x) is a concave function, then 
f(E{x}) > E{f(x)}. (5.120) 


If we set x = ||h||?, we can notice that f(x) = log,(1+ #7) is a concave 
function of x. Hence, we can use (5.120) to conclude that 


z { og, (1+ ee < 10g, (2 Eill ZUN - logy (+55) 


eee (5.121) 


where we utilized that E{||h|]?} =”. This upper bound coincides with the 
capacity of a non-fading LOS channel with the same average SNR. Hence, 
the ergodic capacity can never be larger than the capacity of a corresponding 
LOS channel. 

To determine how much smaller the ergodic capacity can be, we can utilize 
Jensen’s eae again. This time we set x = 1/||h||? and notice that 
f(x) = log,(1 + 4 Ny) is a convex function of x. It follows from (5.119) that 


fio on (1+ ary} > log, ( a i) = logs (1 + ow) 


EU 
= log,(1 + (M — 1)SNR), (5.122) 


where the first equality utilizes that 


p 1 m ©] gM-1e-3 
a. rimes f g 


1 M-27 5 i 
_ (M= DA Jp maiM l = MD (5.123) 
--sh————fs— 


=] 


This mean value is computed by using the PDF in (5.51) of the scaled y?(2M)- 
distribution, and the last equality follows by recognizing that the integral over 
the PDF of the scaled x?(2(M — 1))-distribution is one. 

In summary, we have derived the following chain of inequalities: 


2 
log.(1+(M—1)SNR) < E fio Ogo a+ + allt oh < log,(1+MSNR). (5.124) 
No 
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The gap between the lower and upper bounds can be substantial when M 
is small (as seen in the SISO case) but reduces as M increases. This is a 
consequence of the spatial receive diversity; as shown in (5.58), the relative 
variations in ||h||? reduce the more antennas are used. In the context of ergodic 
capacities, it is common to call it channel hardening [68]. This means that 
the influence of the random channel variations on the capacity disappears as 
M increases if MRC is used at the receiver. The fading still exists, and the 
entries of the channel vector h vary rapidly. However, the variations (partially) 
average out when the M independent random variables are combined in the 
SNR-maximizing way. 


Example 5.12. What is the PDF of Ch = logy (1 + dA) if h ~ Nc(0, BI)? 


The PDF fjnje(x) of ||hl|? is given in (5.51) and can be used to derive 
the PDF fc,(z) of Cn. The connection between ||h||? and Ch is most easily 
exposed through the CDFs but, as stated in (2.101), the PDF is the derivative 
of the CDF. Hence, we can compute the desired PDF as 


ae a qlib]? 
falz) = a {Ch < z} = a fios, (1 + a) < A 


o No 
T e — (27 - 
5Pr{ hI? < = @ -y} 


No ð No 
= 2 = Deal 
Fin? A )) meas ) 
92 Íl Ma — 2 =e 
oi } E "T 2z In(2), for z> 0, (5.125) 
SNR” (M — 1)! 
where the chain rule is used when computing the derivative and SNR = 4 


in the last step. This expression was used in Figure 5.10 for M = 1. 


The ergodic capacity is E{Cy} and its value depends on the PDF of Ch, 
which is given in (5.125) for i.i.d. Rayleigh fading. Figure 5.20 shows this PDF 
with M = 1, M = 8, or M = 32 antennas. The probability mass is shifted to 
larger values as M increases, which results in a larger mean value (i.e., larger 
ergodic capacity). The mass also becomes concentrated in a smaller interval, 
which leads to a larger peak value of the PDF because the area under each 
curve is 1. It is this phenomenon that is called channel hardening. 

Figure 5.21 shows the ergodic capacity of an i.i.d. Rayleigh fading channel 
with SNR = 10dB and a varying number of antennas. The exact value is 
compared with the lower and upper bounds. We notice a huge gap between 
the curves at M = 1, where the lower bound is zero. However, for M > 5, the 
difference between the lower and upper bounds is tiny. More importantly, the 
ergodic capacity is close to the upper bound, representing the capacity of a 
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Figure 5.20: The PDFs of the (conditional) capacity Cy in (5.125) for SNR = 1 and M = 1, 
M = 8, or M = 32 antennas. As M increases, the probability mass shifts to the right and is 
concentrated in a smaller interval. 
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Figure 5.21: The ergodic SIMO capacity of an i.i.d. Rayleigh fading is shown for SNR = 10 dB 
and a varying number of antennas. It is compared with the lower and upper bounds in (5.124), 
where the upper bound corresponds to the capacity of a non-fading LOS channel. 


non-fading LOS channel with the same SNR. Hence, the rate loss incurred by 
having a fading channel is relatively small when the receiver is equipped with 
a handful or more antennas. Hence, the spatial receive diversity makes the 
performance of communication systems more robust to channel fading. 
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5.4.2 Ergodic Capacity of i.i.d. Rayleigh Fading MIMO Channels 


We now consider the MIMO setup where the transmitter has K antennas and 
the receiver has M antennas so that the fast-fading channel is described by 
the M x K channel matrix H. The random realization of this channel matrix 
changes at every time instance. As earlier in this chapter, the receiver knows 
the realization of H but not the transmitter, which only knows the channel 
statistics. Hence, the transmitter must encode its data signal x = [z1,..., £g]" 
in a way that is independent of H, which implies that we cannot create multiple 
parallel channels using the SVD as in Section 3.4. Instead, we must follow the 
approach with arbitrary precoding from Section 3.4.3. Suppose the transmitted 
signal is x ~ Nc(0, Rz) for some arbitrary choice of the covariance matrix 
R,.'° For a given realization of H, we concluded earlier in (3.106) that an 
achievable (conditional) data rate is 


1 
logs (aet (11 + ;HRoH") ) . (5.126) 
0 


By considering a transmission that takes place over infinitely many time 
instances with independent channel realizations, an ergodic achievable rate is 


1 
E fiog, (aet (tu + ween") )} 5 (5.127) 
0 


where the mean value is computed with respect to the channel matrix H. The 
argumentation for this result is the same as earlier in this chapter; the channel 
realizations are samples from a stationary and ergodic random process; thus, 
the time-average rate equals the statistical mean. 

The capacity is the maximum achievable data rate. Although the transmit- 
ter does not know the channel realizations, it can compute the ergodic rate 
in (5.127) and adapt its choice of covariance matrix R, to maximize it. We 
notice that the kth diagonal entry of R, is E{|x,|?} and recall that tr(R,) is 
the sum of the diagonal entries. We want the signal power 


= {ixl} = SE {lar} = eR) (5.128) 


k=1 


to equal the maximum symbol power q. Hence, the ergodic capacity is 


1 
= Ed] Iu + —HR,H® | |}, 12 
C= a (ios: (cet (I+ 3 7 yt eee 


where we need to find the positive semi-definite covariance matrix R, that 
maximizes the ergodic rate under the power constraint tr(R.) = q. 


10This covariance matrix can be factorized as Re = PQP# for some precoding matrix P and 
diagonal power allocation matrix Q, following the definitions made in Section 3.4.3. However, 
this specific structure is not needed in this section. 
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The optimal covariance matrix depends on the distribution of the channel 
matrix. If we consider i.i.d. Rayleigh fading, meaning that all entries of H are 
independent complex Gaussian distributed with the zero mean and identical 
variance, then the optimal covariance matrix has the simple form 


R, = alk. (5.130) 
This means the transmitter should send one independent data symbol from 
each of its K antennas and divide the power q equally between them. We will 
outline the proof, but refer to [31] for the precise details. 

For any choice of the covariance matrix R,, we can express its eigendecom- 
position as R = UDU", where U is a unitary matrix and D is a diagonal 
matrix. The term HR,H" in the ergodic capacity in (5.129) can therefore 
be expressed as HUDU"H" = (HU)D(HU)". Since the entries of H are 
independent Nc(0, 3)-distributed, the entries of HU have the same distribu- 
tion (see Example 2.10). Hence, when computing the mean with respect to 
the channel in (5.129), it is sufficient to consider diagonal covariance matrices 
R, = D because the choice of the unitary matrix makes no difference. 

As all transmit antennas experience channels with the same distribution, 
it should not matter which antenna is assigned a specific amount of the 
total power; we can always reorder the antennas and get the same result. In 
particular, it can be proved that E{log,(det(Ing + v HDH"))} is a jointly 
concave and symmetric function of the diagonal entries of D, which implies 
that the maximum is achieved when all the entries are the same. 

In summary, the ergodic capacity with i.i.d. Rayleigh fading is obtained 
by substituting (5.130) into (5.129): 


` q H 
C=E {lee: (aet (tu + oTa ))} 5 (5.131) 


We note that this result holds even if K > M, which means we transmit 
more signals than the receiver has antennas. This is not an issue since the 
transmitted data signals are encoded to enable decoding at the receiver side 
using the SIC procedure, even if the MIMO channel cannot be divided into 
parallel channels when the transmitter is unaware of the channel realizations. 
The multiplexing gain is, however, limited to r = min(M, K) because this is 
the maximum number of non-zero eigenvalues of HH". If we denote these 


eigenvalues by A1,...,A,, then (5.131) can also be expressed as 
C =E? log, Il (co ae ae =E a 1+ An) 
ever KNo hare KNo , 


(5.132) 
by utilizing the fact that the determinant is the product of the eigenvalues. 
The mean value in (5.132) can be computed in closed form at low SNR. 
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By utilizing the low-SNR approximation in (3.2), we can rewrite (5.132) as 


S q d m 
y D l z= 4, H 
C {> o82(¢) ai} log (¢) ey E {t(HH")} 


= log,(e) “_ MKB = log, (e) MSNR. (5.133) 
KNo 

The final result follows from the fact that the trace is defined as tr(HH") = 
yy =14m and by computing the mean value E{HH"} = K GIy, by using 
the assumption of i.i.d. Rayleigh fading. We can notice in (5.133) that the 
capacity is proportional to the number of receive antennas at low SNR (i.e., a 
receive beamforming gain), while the number of transmit antennas makes no 
difference. Hence, in the absence of CSI at the transmitter, multiple transmit 
antennas are only helpful in achieving multiplexing gains. 


Example 5.13. How does the ergodic MIMO capacity with i.i.d. Rayleigh 
fading behave when K — oo, while M is fixed? 

To answer this question, we need to determine the limit of HH" as 
K + oo. The mth diagonal entry of this matrix is + D7, |fm,«|?, which is 
the sample average of K i.i.d. random variables. The limit follows from the law 
of large numbers (Lemma 2.4) and equals the mean £8 of the individual terms 
|hm,k|?. Similarly, the (m,1)th entry is + ya hm,whj;,, where all terms are 
independent and have zero mean when m # l. It follows from the law of 
large numbers that the off-diagonal entries converge to zero. We have thereby 
proved that 7; HH" — ßIm. By substituting this result into (5.131), we 
obtain the asymptotic ergodic MIMO capacity 


C = logs (act (11 + st) ) = M loga(1 + SNR), (5.134) 
0 


where the mean value is removed since the randomness vanishes as K — oo. 
This ergodic capacity is M times larger than the SISO capacity of the corre- 
sponding non-fading channel. The multiplexing gain is min( M, K) = M. 


We have now covered the ergodic capacities of SISO, SIMO, and MIMO 
channels. What remains is to consider the MISO channel, which is a special 
case of the MIMO channel with M = 1 receive antenna. In that special case, 
the channel matrix can be written as H = h? = [hy,...,hx]. Hence, the 
ergodic capacity in (5.131) reduces to 


+ q TI * x q 2 
=E?] I ——h*th =E°] 1+ —l|h ; 
7 {loss (det (11 + ze ))} {loss ( + KM. °)} 


(5.135) 
This capacity expression resembles the ergodic SIMO capacity in (5.118), but 
with a crucial difference: Instead of getting the sum of the channel gains 
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Figure 5.22: The ergodic capacity over i.i.d. Rayleigh fading channels. We compare a SISO 


channel with a SIMO channel with M = 4 antennas, a MISO channel with K = 4 antennas, 
and a MIMO channel with M = K = 4 antennas. 


hl]? = 5%] |Am|? of the receive antennas, we get the average ||h||?/K = 
S {L |hg|?/K of the channel gains among the transmit antennas. The division 
by K represents the absence of a beamforming gain in the MISO case, as 
we previously observed in slow fading. One way to interpret this is that the 
transmitter must spread its power over all K dimensions in C* to ensure it 
reaches the receiver, even if only one randomly selected dimension leads to 
the receiver. By sending K independent signals from the antennas, the MISO 
channel achieves a diversity gain that provides channel hardening, making the 
setup preferable over a SISO channel. 

We previously presented a duality result in Corollary 3.4, which stated 
that the capacity is the same in both directions of a MIMO channel if the ratio 
between the total transmit power and noise variance is the same. This result 
was obtained for a non-fading channel known at both the transmitter and 
the receiver. Considering the ergodic capacity in (5.131), the same result can 
only be obtained in the symmetric case of M = K, where there are equally 
many antennas in both directions. In contrast, we observed that the ergodic 
capacities of MISO and SIMO systems are very different because we can only 
obtain a beamforming gain at the receiver side. 

Figure 5.22 shows the ergodic capacity as a function of the SNR in the 
case of i.i.d. Rayleigh fading channels. We compare four setups: a SISO case, 
a MISO case with K = 4 transmit antennas, a SIMO case with M = 4 
receive antennas, and a MIMO case with M = K = 4 antennas. Note that 
all four cases can be computed using the MIMO expression in (5.131), but 
the mean value must be computed numerically. The results resemble those in 
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Figure 3.15 for non-fading channels, except that the SIMO and MISO cases 
are not equal anymore. The SISO case gives the lowest ergodic capacity. The 
MISO case is slightly better than the SISO case at high SNR, thanks to the 
diversity gain, but the performance gap is just around 2dB. The SIMO case 
gives a curve having roughly the same shape as in the SISO setup, but it is 
shifted to the left thanks to both the beamforming gain and the diversity 
gain. The gap to the SISO curve is around 8dB at high SNR, of which 6dB 
is the beamforming gain and the remaining 2dB is the diversity gain (same 
as in the MISO case). Note that the relative gain of having multiple receive 
antennas is greater for fading channels than for non-fading channels, for which 
there is only a beamforming gain. However, in absolute numbers, the ergodic 
capacities in Figure 5.22 are always smaller than the corresponding capacities 
of the non-fading channels in Figure 3.15. The highest capacity is achieved in 
the MIMO case, where a multiplexing gain is achieved since the transmitter 
sends four signals. At high SNR, the capacity curve grows as M log,(SNR), 
roughly M times faster than in the SISO case. 


5.5 Block Fading and Channel Estimation 


The capacity analysis in this book has thus far relied on the assumption that 
the receiver knows the channel perfectly. This is well motivated when the 
channel takes a single realization throughout the communication because the 
capacity is (by definition) achieved by transmitting very long data packets. 
Section 4.2.4 exemplified how a preamble containing a known pilot sequence 
can be appended to the packet to enable perfect channel estimation while still 
being negligibly small compared to the payload part that contains data. This 
argument holds in both scenarios with deterministic LOS channels (Chapter 4) 
and slow fading (Section 5.3), but not under fast fading. When the channel 
varies rapidly, we need to transmit new pilot sequences at the same pace as 
the channel changes, which has two main consequences. Firstly, the fraction 
of symbols spent on pilots instead of data is non-negligible. Secondly, the 
limited pilot sequence length leads to non-zero estimation errors, so perfect 
channel knowledge is no longer achieved. In this section, we will quantify 
the channel estimation errors and their detrimental impact on the ergodic 
capacity, following a methodology originating from [69], [70]. 

We return to the block fading model introduced when describing Figure 5.9. 
In this model, we treat the fading channel as being piecewise constant over 
short time intervals, but on every occasion the channel changes, the new 
realization is generated independently. An interval with a constant channel 
realization is called a coherence block. We let Le denote the number of symbols 
that can be transmitted within each block. We must divide the symbols 
between transmitting a known pilot sequence that enables channel estimation 
at the receiver and unknown payload data that the receiver can decode using 
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Figure 5.23: A block-fading channel has a piecewise constant channel response. Each such 
segment is called a coherence block. This is an abstraction of the dashed continuously varying 
channel. Le symbols can be transmitted within each coherence block, of which Lp symbols are 
used to transmit a preamble containing a pilot sequence that enables channel estimation. The 
rest contains a payload with Le — Lp data symbols. 


the acquired CSI. The resulting transmission protocol is repeated in each 
coherence block but for a new independent channel realization, as illustrated 
in Figure 5.23. This repetition makes it sufficient to study the operation of a 
single coherence block with an arbitrary random channel realization. There 
are two phases of each coherence block: 


e Preamble: A known pilot sequence of length Lp symbols is transmitted. 
e Payload: Le — Lp data symbols are transmitted. 


We will analyze these phases in detail in the following sections. 


5.5.1 Pilot-Based Channel Estimation 


We begin by considering the channel estimation enabled by pilot transmis- 
sion in a SIMO scenario where the channel vector h € C™ is subject to 
iid. Rayleigh fading: h ~ Nc(0, Im). We select the known pilot of length 
Lp = 1 that equals ,/q and uses the maximum symbol power. The received 
signal then becomes 


y =hyq+n, (5.136) 


5.5. Block Fading and Channel Estimation 367 


where n ~ Nc(0, Nola) is the receiver noise. Since the channel and noise are 
independent random variables, it follows that y ~ Nc(0, (Gq + No)Im). 

The mth received signal in (5.136) has the form ym = hm y/q + Nnm, where 
the unknown channel realization hm ~ Nc(0, 8) is observed in additive noise. 
Since the antennas’ channel coefficients are independently distributed, we 
can estimate them separately. The received signal has the same structure 
as in Lemma 2.11, which implies that the MMSE estimate of hm given the 
observation Ym, is 


ME E E E (5.137) 
0 
Among all conceivable ways of transforming ym into a guess of hm, (5.137) is 


the option that minimizes the average squared estimation error E{|hin —/m|?}. 
The minimal value is given by (2.156) as 


> j2 BNo 
MSE, = E 4 Ihm — hm = 5.138 
aen (| | } ba + No ( ) 


We can collect all the MMSE estimates of the channel coefficients in the vector 
form h € C™ as 


h = Nice eve ow, (5.139) 


M 
This random vector also takes a new realization in every coherence block 
since we repeat the estimation once per block. By using the aforementioned 
distribution of y, we notice that 


n BVO ana 7 Bq 
h ~Ne (0 Bq + No (Bq j Notts =Ne (0, sol) 
= Ne (0,(8 — MSE» )Im) . (5.140) 


We will denote the estimation error as h = h—h and each entry has a variance 
that equals the MSE, such that 


h ~ Ne (0, MSE;Iy,). (5.141) 


The entries of the estimate h and the estimation error h have variances 
that add up to the variance of the entries of the original channel h since 
(8 — MSE;,) + MSE, = 8. Hence, the channel estimation effectively splits the 
true channel into one known part h with the reduced variance 8 — MSE, and 
an unknown part h with the remaining variance MSE}. Having an estimate 
with a large variance is good because we want the known part of the channel 
to be strong. An accurate estimate is characterized by a small MSE, implying 
that the estimate is likely near the true channel realization. 
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Figure 5.24: In a MIMO setup with K transmit antennas, we need to transmit K different 
pilots to enable estimation of the entire channel matrix H on the receiver side. The receiver can 
be equipped with any number of antennas, M, since these listen to the same pilots. 


The procedure mentioned above enables the estimation of the entire SIMO 
channel vector h by transmitting only a single pilot symbol (i.e., Lp = 1). 
This procedure works regardless of how many antennas the receiver has and 
resembles how public speeches are carried out: any number of people can 
listen simultaneously to the same speaker. However, the audience members 
need to take turns when asking questions to the speaker, which can become 
cumbersome when the audience is large. Analogously, the number of transmit 
antennas determines how many pilots must be transmitted to estimate the 
entire channel, not the number of receive antennas. We need Lp = K pilots if 
there are K transmit antennas. 


We now switch focus to a MIMO channel with K transmit antennas 
and M receive antennas. It can be represented by the channel matrix H = 
[hi,... hg] € C’**, which contains K columns denoted as hy € C™ for 
k= 1,..., K. Suppose we transmit Lp = K pilots so that \/q is transmitted 
from antenna k at symbol time k. Figure 5.24 illustrates such a setup with 
K =2 transmit antennas and M = 5 receive antennas, in which case we need 
to transmit two pilots. The received signal at symbol time k becomes 


Yr = hk vq + ne, k=1,...,K, (5.142) 


where ng ~ Nc(0, NoIm) is the receiver noise. 

The received signal in (5.142) has the same form as in (5.136) for the SIMO 
case. Hence, we can effectively divide the MIMO estimation problem into 
K SIMO estimation subproblems. Suppose H is modeled by i.i.d. Rayleigh 
fading with the entries independently distributed as hm, ~ Nc(0, 8), for 
m=1,...,M and k = 1,..., K. It then follows from the previous derivations 
that the MMSE estimate is a matrix H and the estimation error is a matrix 
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H = H — H with entries independently distributed as 


m.k ~ Nc(0, 8 — MSE,), (5.143) 
hm, ~ Nc(0, MSE;,). (5.144) 


The receiver obtains the channel estimates, which is aligned with our standing 
assumption in this chapter that the receiver has CSI but not the transmitter. 
To provide the transmitter with CSI, we need to transmit in the reverse 
direction, either by feeding back the estimate or sending pilots also in that 
direction. This is done in some practical systems but not in this chapter. 


5.5.2 Ergodic Rate with Imperfect CSI at the Receiver 


We will now consider how the channel estimates from the last section can 
be used for signal detection and, particularly, how the achievable data rate 
is affected by the fact that the receiver has imperfect CSI. We consider the 
transmission of a packet that spans L coherence blocks and will let L — oo 
to characterize the ergodic capacity. To this end, we revisit the memoryless 
SISO channel in (2.130). The received signal at an arbitrary time instance in 
coherence block l can be expressed as 


yl = (A +AU) often], 1=1,...,0, (5.145) 


where All] is the MMSE estimated part of the channel response obtained by 
sending Lp = 1 pilot and h{l] ~ Nc(0, MSE;,) is the independent estimation 
error. Similar to (5.104), the capacity is given by 


1 
C= lim max (1 — z) Rr, 5.146 
Loo Ay keari x(x} (2[1],.--, }) Le E ( ) 
where 1 — Lp/Le = 1 — 1/Le is the fraction of each coherence block used for 
data transmission (while the fraction 1/Le is used for pilots) and the average 


data rate over the L coherence blocks is 


Ry = Z (Haia) -H (aliall 9121, All], AE) ) 
= Z XH CU) -H (zit Am): (5.147) 


Note that we used the alternative mutual information expression in (2.137), 
which equals the entropy of the transmitted signal minus the remaining entropy 
given the information known at the receiver (which is y{l] and A{l] in this case). 
This expression is easier to evaluate under imperfect CSI. The block-fading 
assumption implies that independent realizations of h[l], h[l] are drawn in each 
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coherence block from a stationary and ergodic random process. Hence, we can 
use the law of large numbers (Lemma 2.4) to compute the limit in (5.146) as 


C = max (1 = x) E {H(2) —H(z|y, hy} (5.148) 
f(x) Le 

where the mean value is computed with respect to the channel estimate 
realization fh and the maximum is computed concerning all signal distributions 
fx(x) that have a symbol power of q. There is no easy way to compute the 
exact capacity when the receiver has imperfect CSI, but this remains one of 
the open problems in information theory. Therefore, we will derive a tight 
lower bound by following an approach from [69], [70]. To prepare for that 
derivation, we begin by considering the estimation of the data signal z. 


Example 5.14. What is the LMMSE estimate of x when y in (5.145) is 
observed and h is known? What is the resulting conditional MSE? 

The LMMSE estimator has the form ê = ay, where a is selected to 
minimize the MSE by satisfying the orthogonality principle E{zy*|h} = 
with = x — & being the estimation error. This condition can be expanded as 


= E {ay*|h} = E{(e — ay) y*|h} = ZET, — aE {lyl? a (5.149) 


By solving for a in (5.149), we obtain 


E{ay*|h} A E {z (+ħz+n) | } 
+ 


h 
a= E {lull} RG henf |i} 
7 E { |al} (A+ e{h}) + + E{x}E{n*} 7 oft 
E {|x|} (ae J E {|ñ }) + z {in} Jeee MS + No (5.150) 


by using the fact that x, h, and n are uncorrelated and have zero mean 

(conditioned on A). The resulting conditional MSE for the given value of h is 

i} —a* E{ ay" i} = —ak {ya*|h} 
ooy So 

=0 Si 

2e 2MSEn + aN 
ORE | eared ntaNo (5.151) 
q|h|? +qMSE,+ No  qlh|? + gMSEp, + No 


MSE,);, = E{|#?|h} =E {za 


where we used the orthogonality principle, reused computations from (5.150), 
and finally utilized that E =q. 
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We will make two (potentially) suboptimal assumptions to characterize 
the capacity in closed form. The first assumption is that x ~ Nc(0, q). This 
is the optimal signal distribution when the receiver has perfect CSI, but not 
necessarily in our scenario with imperfect CSI. Under this assumption, we 
can use Lemma 2.9 to compute the first term in (5.148) as 


H(«) = loga (erq). (5.152) 


The second suboptimal assumption is that the receiver computes an LMMSE 
estimate of x based on its available observations y,h and uses the resulting 
conditional MSE in (5.151) to upper bound the second term in (5.148) as 


H (z y,h) =H (z — êly, h) 
< H(z — a|h) 
< log, (enMSE,, i) (5.153) 


The equality in (5.153) follows from subtracting the LMMSE estimate ĉ 
obtained in Example 5.14 from x. This can be done without changing the 
differential entropy since ĉ is deterministic given y and h. We then obtain the 
first upper bound by removing the knowledge of y since the conditioning on a 
random variable cannot increase the entropy. The second upper bound follows 
from Lemma 2.9 since the differential entropy is maximized by a complex 
Gaussian distribution with the same variance as that of x — ĉ (conditioned 
on the realization h), which was given in (5.151). 

By utilizing (5.152) and (5.153), we can obtain a lower bound on the 
ergodic capacity in (5.148) as 

H(z) — H (sly, A)} 


o> (1 pet 
“(Peo (st) 
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The first lower bound follows from assuming that the transmitter uses a 
suboptimal signal distribution. The second lower bound follows from (5.153), 
which assumes that the receiver decodes the signal in a suboptimal way. We 
have now proved the following result. 
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Corollary 5.2. Consider the discrete memoryless block-fading channel with 
input z|] € C and output y|l] € C given by 


yli) = (A) + Ald) ofl] + nfl, (5.155) 


where n[l] ~ Nc(0, No) is independent noise. Suppose the input distribution 
is feasible whenever the symbol power satisfies E{|x[I]|?} < q. Furthermore, 
suppose the channel h|] = Î[l] + hl] takes independent and identically 
distributed realizations in each coherence block from a distribution with finite 
variance and that a fraction 1/Le of each block is used for pilots. If the 
channel estimate hl] is known at the output while the channel estimation 
error h{l] ~ Nc(0, MSEp) is unknown and independent, the channel capacity 
can be lower bounded as 


2 


A 


q\h 
qMSEn + No 


bit/symbol, (5.156) 


lA 
c>(1-Ż) E $ logs | 1+ 


C 


where the bound is achieved when z|] ~ Nc(0,q) and independent for each l. 


The lower bound in (5.156) has a familiar form (1—1/L,.)E{log,(1+SINR)}, 
where q|h|?/(qMSE;, + No) acts as the instantaneous SINR. If we compare 
the new expression with the ergodic capacity E{log,(1+ q|h|?/No)} in (5.111) 
for a fast-fading channel with perfect CSI, we can notice three key differences. 
Firstly, the pre-log factor (1 — 1/Z,) in (5.156) accounts for the transmission 
resources spent on channel estimation in each coherence block. This makes the 
new expression more realistic since CSI cannot be acquired for free. Secondly, 
the channel response h in (5.111) is replaced by the channel estimate h in 
(5.156). As the variance of these coefficients determines the average channel 
gain, it has effectively reduced from 8 to 6 — MSE». Finally, the noise variance 
No has been replaced with qMSEp + No, which also contains a penalty term 
determined by the imperfect CSI. Its variance matches that of the signal 
component hx received over the unknown portion of the channel, which is a 
noise-like perturbation that is worst-case modeled as complex Gaussian noise 
when computing the capacity bound. 

We will now generalize the analysis by considering a SIMO setup with 
M antennas at the receiver, in which case the received signal y € C™ in an 
arbitrary coherence block can be expressed as 


y=(h+h)o+n, (5.157) 


where h € C™ is the known channel estimate in (5.140) obtained using Lp = 1 
pilot, h ~ Nc (0, MSE, I),) is the independent unknown estimation error in 
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(5.141), and n ~ Ne (0, NoI ys) is the receiver noise. We can notice from the 
SISO case in (5.154) that the ergodic capacity can be lower bounded as 


1 i q 
> =e D ee È 
C2 (1 +) fios: (m) | i (PLak 


where q is the transmit power and MSE ih is the conditional MSE when 
computing the LMMSE estimate of x given the received signal y and channel 
estimate h. This is a lower bound because it relies on the suboptimal as- 
sumptions of a Gaussian codebook x ~ Nc(0,q) and that LMMSE processing 
is used for signal detection at the receiver, but at least we know that the 
rate can be achieved in this particular way. The LMMSE estimate of x has 
the form ĉ = w"y, where w € C™ is receive combining vector. We can 
determine the LMMSE combining vector using the orthogonality principle (as 
in Example 5.14), but we will instead follow the approach from Example 3.4 
to directly compute the MSE as 


E {|x — êh} = edje (1-w" (h+h)) -w"n| A} 
Pe flP}e{—w" (A+R) A) +E {wnf h) 


= q(1 +w” (hh" + MSE; ) w-wh hw) tw" Noluw 


=qt+w" (ahh + (qMSE;, + No)kur) w—w" gh — qÑ" w, 
` å Si 


SS 


=a =—aH 
=B 


(5.159) 


where (a) follows from utilizing the (conditional) uncorrelation E{an"|h} = 0 
between the signal x and the noise n to remove some of the terms. We stress 
that the combining vector was treated as deterministic when h is known, but 
we want to find the optimal way that it depends on the estimate. By utilizing 
the notation a and B that was introduced in (5.159), we can write the MSE 
as 


E {le = êP h} = q + w”Bw — w”a — aw 


=q —a"B™!a + (w — B`ta)” B (w — B~ta) 
> q-a"B-'a= MSE, f> (5.160) 


where we complete the squares!’ and then notice that the quadratic form 
in the last term attains its minimum value of zero if w = B~'a. This is the 


1lWe utilize the fact that (w — B71a)"B(w — B~ ta) = a®B-!a+ w"Bw — w!a — a"w to 
gather all the terms that depend on w in a quadratic form. The missing term a®B~1ta must be 
subtracted when doing that. This is the matrix algebra equivalent of completing the squares in 
a scalar quadratic equation. 
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LMMSE receive combining vector, and it can be rewritten using (2.49) as 
w=B'ta 
AA =l a 
= (qhh" + (qMSEn + No)Im) gh 
q A 


= h. (5.161) 
qlih||? + gMSE;, + No 


This LMMSE combining vector is the counterpart to MRC under imperfect 
CSI because it projects the received signal into the direction of the channel 
estimate. The minimum MSE in (5.160) can now be expressed in a concise 
way using (5.161) as 


lfl]? 
qlih||2 + MSE; + No 


MSE „n =4-a"B a =q 


_  a(qMSEn + No) 
q||h||? + gMSE;, + No 


(5.162) 


We can finally compute the capacity lower bound in (5.158) as 
1 h||2 + MSE, + Ni 
C> (1-7) Elig ail ||? + gMSEn + No 
Tos q(qMSEn + No) 


eo qlib]? 
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This is the natural SIMO extension of the SISO capacity bound in (5.156), 
which has the same pre-log factor (1 — 1/Le). There is one key difference: the 
channel gain in the numerator is ||h||? in the SIMO case instead of |h|?. Since 
these terms have the means M(G — MSE;,) and 6 — MSEnp, respectively, we 
can conclude that a beamforming gain proportional to M is achievable in 
the SIMO setup despite the imperfect CSI. Notably, the interference term 
caused by the CSI imperfection remains equal to qMSEp, independently of the 
number of receive antennas. This is remarkable because the total variance of 
the signal received over the unknown portion of the channel is qM MSE». Since 
hw Nc(0, MSE, Ins), that power is uniformly distributed over all M receiver 
dimensions and, thus, only a fraction 1/M of it appears on average in the 
dimension utilized by the LMMSE combining. This is the same phenomenon 
that makes the noise power independent of the number of receive antennas in 
the SNR expression, even if the total noise power in the receiver hardware 
is proportional to M. In conclusion, a SIMO receiver becomes increasingly 
robust to CSI imperfections as the number of antennas increases because only 
the desired signal power increases with M. 
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Example 5.15. What is the relative SNR loss caused by imperfect CSI? What 
happens to this loss when the SNR is high? 

If we define a random vector e ~ Nc(0, Im) with unit-variance complex 
Gaussian entries, we can notice that the instantaneous SNR in (5.163) satisfies 


qb? a(8 — MSEn) 
qMSEr + No qMSEr + No 


lell’, (5.164) 


where the expressions are equally distributed since 8 — MSE, is the variance 
of each entry in h. Similarly, the instantaneous SNR in (5.118) with perfect 
CSI satisfies 

qll]? 


No 
The relative SNR loss caused by the CSI imperfections becomes 


qb 2 
w — i all 
tel (5.165) 
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The relative loss at high SNRs can be obtained by considering the asymptotic 
limit where q > oo. The MSE in (5.138) has the limit 


BNo 
MSEE Se es 5.167 
i bqa + No ( 
but the convergence to zero is slow, so the interference term qMSE, has the 
non-zero limit aN aN 
gPLNo 0 
MSE, = > = No: 5.168 
i Bq + No B y ( ) 
Hence, the relative SNR loss in (5.166) has the asymptotic limit 
No (1—0 1 
e ee (5.169) 


No + No 2 


In conclusion, there is a 3dB loss in SNR in the capacity expression in the 
high-SNR regime, but the capacity anyway grows unboundedly with q. 


The ergodic capacity in the MIMO scenario with K transmit antennas 
and M receive antennas can be lower bounded similarly to the SIMO scenario. 
We recall from Section 3.4.3 that the received signal y € CM when using 


an arbitrary precoding matrix P = [p1,..., px] € C*** with unit-norm 
columns is 
A ~ K A K ~ 
y= (Ê + H) Px+n= 5 piä, + ð pit, +n, (5.170) 
k=1 k=1 


ee--——_/”/ 
=E 
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where X = [%1,...,%K]" ~ Nc(0, Q) contains the independent data symbols 
and Q = diag(q,...,q«) is a diagonal power allocation matrix. The coeffi- 
cients in Q should be selected to satisfy sae dk = q so that the maximum 
symbol power is used. The new properties in this section are the block fading 
where H € C™*K is the known channel estimate with i.i.d. entries distributed 
according to (5.143) as Nee ~ Nc(0, 8 — MSE»), while H is the estimation er- 
ror with iid. entries distributed according to (5.144) as Am, ~ Nc(0, MSEp). 
The estimate is obtained by transmitting K pilots, while Le — K symbols per 
coherence block are used for data transmission. The received signal in (5.170) 
can be divided into the known first term, where H acts as the channel matrix, 
and the unknown term e, which we know from earlier in this section will act 
as extra noise. This term has (conditional) zero mean E{e|H} = 0 since the 
data symbols have zero mean, while the conditional covariance matrix can be 
computed as 


K K 
D {ee"|H} = 5 dk E {Appi A" A} = 5 MSE, |lpz l| Im = qMSE,Im, 
k=1 k=1 
(5.171) 
where the first equality follows from the independence of the data symbols and 
the second equality follows from the fact that Fip, ~ Nc(0, MSE, ||px|| Im), 
which is independent of H. The last equality utilizes that ||p,|| = 1 and 
D qk = q by assumption. 

We demonstrated in Section 3.4.3 that the maximum achievable data rate 
can be achieved through the LMMSE-SIC procedure, where the K transmitted 
signals are decoded sequentially while treating all other signals as noise. In 
the block-fading scenario, the decoding of each signal will be subject to the 
extra noise vector €, which cannot be removed at the receiver since H is 
unknown. Hence, based on the previous results in this section, we conclude 
that the variance qMSE, of the entries in € can be added to the receiver noise 
n. An achievable ergodic rate during the data transmission is then obtained 
by generalizing (3.106) as 


D : 
od] t (In + ———_APQP"ft" 5.172 
{oes (det (Ia + aam rm POPA) TD 


where the mean value is computed with respect to the random channel 
estimates in different coherence blocks. Since the channel estimate features 
i.i.d. fading, we further know from Section 5.4.2 that Rz = PQP® = fix 
maximizes the expression in (5.172) when the transmitter is unaware of the 
channel. In summary, a capacity lower bound in the MIMO case is 


K q/K AA 
> —— J]E j —— HH” .1 
C > (1 5) fios, (aet (11 + IMSE» + No ))} i (5.173) 


where we also accounted for the fact that a fraction K/L, of each coherence 
block is used for transmitting pilots. This capacity bound shares all the 
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Figure 5.25: The ergodic rate over i.i.d. Rayleigh fading channels, which are either modeled as 
fast-fading with perfect CSI at the receiver or block-fading with imperfect CSI. We consider a 
SIMO channel with M = 4 antennas and a MIMO channel with M = K = 4 antennas. 


essential features with the previous bounds in this section. There is a pre-log 
factor (1 — K/L-,), the true channel matrix H is replaced with the estimated 
channel matrix H, and the estimation errors result in an SNR loss created by 
the extra noise variance qMSE,. Hence, we can achieve the same multiplexing 
and beamforming gains under imperfect CSI as with perfect CSI but starting 
from a worse situation with a smaller pre-log factor and a relative SNR loss. 

Figure 5.25 compares the ergodic capacity with perfect CSI at the receiver 
(as in Section 5.4.2) with the lower bounds obtained in this section for block- 
fading with imperfect CSI. We assume coherence blocks with Le = 200 symbols 
and consider a MIMO setup with M = K = 4 antennas and a SIMO setup 
with M = 4 receive antennas. The figure shows the ergodic rates as functions 
of the average SNR ae The pre-log factor is almost one in the considered 
setups, so the main impact of the imperfect CSI is the relative SNR loss in the 
rate expressions. This loss results in the shift of the rate curves to the right 
by approximately 3dB in the high-SNR regime, as predicted in Example 5.15. 
This loss can, for instance, be identified by comparing the rates achieved at 
27 and 30dB, which are nearly the same. The shift of the curves otherwise 
confirms that the same beamforming and multiplexing gains are achieved with 
perfect and imperfect CSI; thus, MIMO systems can operate effectively also 
when the receiver has imperfect channel knowledge. 


5.5.3 Ergodic Rate with Imperfect CSI Available Everywhere 


We have derived ergodic rates achievable in a block-fading scenario where 
the receiver obtains CSI through pilot transmission, but the transmitter 
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is unaware of the channel realization. This section considers the scenario 
where the transmitter somehow gains access to the same channel estimate, 
denoted by H in the MIMO scenario with K transmit antennas and M receive 
antennas. This CSI can be utilized to optimize the transmission, particularly 
the precoding. We return to the achievable rate expression in (5.172) and 
optimize the precoding matrix P and power allocation matrix Q: 


1 A A 
R=E logy ( det (Iu + —————ipqp"f" ) ) $. 
E osa ( ° (Gera a ))} 
K 


qi=0,..., aK 20), 4 qIk=4 


(5.174) 


The capacity is lower bounded as C > (1— K/L.) R when including the pre-log 
factor caused by transmitting K pilots per coherence block. The key difference 
from the previous section is that the precoding optimization is done inside the 
mean value once per coherence block. This optimization problem coincides 
with the problem considered in Section 3.4 for deterministic channel matrices. 
More precisely, let H = USV” denote the SVD of the channel estimate, where 
$1,---,8, denote the r = min(M, K) non-zero singular values.'? It follows 
that P = V is the rate-maximizing precoding, while q,...,qK should be 
selected based on the water-filling power allocation. By utilizing Theorem 3.1, 
we can rewrite (5.174) as 


opt 
= dk Si, 
R= J5 > logs (: + qMSE, + ai tx] li (5.175) 


where the power allocation in the coherence block with the singular value 
realization 51,..., 5, is 


7 MSE), + N 
go? = max (.- eo), k=1,...57, (5.176) 
Sk 


and the variable p is selected to make X`}; q;" = q. The water-filling is also 
affected by the imperfect CSI since the noise variance is increased by gqMSEa. 
The singular values have the same distribution as those of the true channel 
matrix H, except that the variance is reduced by a factor (6 — MSE, )/£. 
The gain in ergodic rate from having CSI at the transmitter can be 
quantified under i.i.d. Rayleigh fading by computing the ratio between the 
rate in (5.175) with CSI at the transmitter and the rate in (5.173) without CSI 
at the transmitter. The difference in the ergodic rates is generated by whether 
the precoding is based on the SVD of the estimated channel matrix or not. 
Figure 5.26 shows the relative rate gain as a function of the SNR for different 
numbers of transmit and receive antennas. We notice that having CSI at the 


12The maximum number of non-zero singular values is min(M, K) and all these values will 
be non-zero with probability 1 under i.i.d. Rayleigh fading, as explained in Example 5.4. 
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Relative rate gain 


Figure 5.26: The relative gain in ergodic rate from having CSI at the transmitter, which is 
computed as the ratio between the rate in (5.175) with CSI and the rate in (5.173) without 
CSI. We consider i.i.d. Rayleigh fading MIMO channels with different numbers of transmit and 
receive antennas. 


transmitter is primarily useful at lower SNRs, where transmit beamforming 
gains can be achieved by only transmitting in the estimated channel’s strongest 
direction(s). The gain is larger when the transmitter has more antennas than 
the receiver (solid black line) and smaller when the receiver has more antennas 
than the transmitter (blue dash-dotted line). The benefit of having CSI at the 
transmitter vanishes asymptotically, except if K > M when it remains vital 
for the transmitter to concentrate the transmit power in the signal dimensions 
that reach the receiver. In conclusion, feeding back channel estimates to the 
transmitter has clear benefits, particularly when the SNR is low, so transmit 
beamforming gains are more valuable than multiplexing gains. 


5.6 Sparse Multipath Propagation and Dual Polarization 


The i.i.d. Rayleigh fading model was derived in Section 5.1.2 by considering 
the deployment of half-wavelength-spaced ULAs in an isotropic rich multipath 
environment. The statistical independence between the entries of the channel 
vector/matrix simplifies the analysis of slow and fast fading channels, but it 
is generally not an accurate model of practical channels. Multiple factors can 
break the independence: other array geometries than ULAs, the use of directive 
or dual-polarized antennas, and non-isotropic multipath environments. While 
the system designer can control the former two factors, the propagation 
environment is essentially given by nature, and its non-isotropic features 
become particularly evident as the number of antennas increases and the 
wavelength shrinks. Therefore, we will develop a model for the channel fading 
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Figure 5.27: Illustration of a sparse multipath propagation environment where only signal 
components that leave the transmitter in some distinct angular directions (p;,0;) reach the 
receiver. The figure shows three such directions. The same setup was considered in Figure 2.14. 


distribution that can be utilized in more realistic propagation environments. 
We call it sparse multipath propagation to distinguish it from the rich multipath 
propagation assumption made previously in this chapter. We will first consider 
single-polarized antennas and then dual-polarized antennas. 

We begin by considering the MISO channel in Figure 5.27 where a base 
station equipped with M antennas transmits to a single-antenna user. Three 
propagation paths are indicated in the figure: one direct path and two re- 
flected/scattered paths. Each path is associated with a particular azimuth 
angle y; and elevation angle 6;, representing the direction of the path as seen 
from the transmitter. We let L denote the total number of propagation paths 
in this section. When the reflecting objects and receiver are in the far-field 
of the transmitter, we can utilize the array response vector a(y,0) € C™ 
of the transmitter array to model each propagation path. A methodology 
for computing array response vectors for any specific array geometries was 
provided in Section 4.5. In this section, we will treat it as an arbitrary vector 
that might include antenna gains. The ith propagation path is associated 
with a signal attenuation a; € [0,1] and a phase-shift Y; € [—7, 7), where we 
utilize the same notation as in Section 5.1. The components of the radiated 
signal that reach the receiver over the different paths are superimposed. The 
channel vector can then be expressed as 


L 
h = X aye a(yi, 6). (5.177) 

i=1 
Since the array response vectors assign different phase-shifts to different 
antennas, the summation in (5.177) will lead to different complex numbers for 
different antennas. The signals emitted from some transmit antennas might be 
superimposed constructively over the multipath channel, while the signals from 
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Figure 5.28: The beamforming gain that is observed in different directions y when a ULA with 
M = 10 antennas transmits using MRT. The dashed curve considers the LOS case with a single 
propagation path in the direction yı = 7/6. The solid curve considers the NLOS case with 
L = 4 equally strong paths in the directions y1 = 7/6, p2 = 7/12, p3 = —T /4, and p4 = -7/3 
(and phase-shifts are progressively shifted by 7/3). These angles are marked with stars. The 
beam pattern becomes increasingly complex and ceases to have a distinct angular directivity 
when the number of paths increases. 


other antennas might superimpose destructively. Hence, the transmitter should 
allocate its power differently over the antennas to maximize the SNR. The 
use of MRT with the precoding vector p = h*/||h|| finds the SNR-maximizing 
way of utilizing the multipath propagation environment. The radiated signal 
is a superposition of beams focused in the directions of the L different paths 
since h is a linear combination of a(y;,6;) for i =1,..., L. 

Whenever the channel contains multiple paths with distinctly different 
angles, the radiated signal generated by MRT will no longer look like a beam 
pointing in a single angular direction. Figure 5.28 illustrates the angular 
beam pattern when transmitting in a single direction (dashed curve) and 
when the channel consists of L = 4 paths (solid curve). The beam pattern 
is more complex in the latter case, but it has four peaks in roughly the 
same directions as those leading to the objects creating the multipaths (those 
directions are marked with stars). The beam pattern is a superposition of 
angular beams focused precisely toward these objects, but when their main 
beams and side-lobes interact, the combined angular beam pattern is smeared 
out. As more paths are added to the channel vector, as is typically the case 
in practice, the radiation pattern will look increasingly complex and lack a 
distinct angular directivity. The main point is that one should not expect the 
transmitted signal in multiple antenna communications to look like an angular 
beam except in the special case of free-space LOS propagation considered in 
Chapter 4. The only goal of precoding is to radiate the same signal from all 
antennas so that they add constructively at the receiver location. 
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5.6.1 Clustered Multipath Propagation 


A key characteristic of sparse multipath propagation is that a limited number 
of physical objects creates the propagation paths. These objects are located 
in distinctly different angular directions, as seen from the transmitter and 
receiver. We will call each such object a multipath cluster and let Ne denote 
the number of clusters. We consider an NLOS channel, so the radiated signal 
can only reach the receiver via one of these clusters. A multipath cluster 
can give rise to many propagation paths, but they are all associated with 
approximately the same pair of azimuth and elevation angles. This definition 
requires the cluster to span only a tiny angular interval from the transmitter’s 
perspective; however, the physical size depends on how far away the cluster is 
and how many antennas the transmitter has. Recall that the array cannot 
resolve the angular differences between paths when these are smaller than the 
halfpower beamwidth. 


Example 5.16. What is the half-power angular beamwidth when using a ULA 
with M = 10 antennas and A = 4/2? How physically large can a multipath 
cluster be if it is located 10 or 100 meters away? 

The half-power beamwidth was considered in Example 4.7, where the 
approximate formula 1.772/M radians was derived. This becomes b = 0.1772 
radians or 10.15° with M = 10 antennas. An angular interval b radians wide 
becomes 2d tan(b/2) meters wide at a distance d from the transmitter. The 
width becomes 1.78m if d = 10m and 17.8m if d = 100m; thus, a single 
multipath cluster can be physically large, particularly when it is far from 
the antenna array. The green and blue circles in Figure 5.27 might represent 
different multipath clusters. It can be buildings, cars, mountains, etc. 


We let (yi, 0i) denote the common angles associated with all the paths 
generated by the ith cluster. Moreover, we assume each cluster gives rise to 
Npath different paths, each having a different attenuation a;,, and phase-shift 
Win, for n=1,..., Npath. We then have a total of L = Na Npath propagation 
paths, but some share the same angles. Hence, we can reformulate (5.177) as 


Npath 


Nei 
h=)>>| >) aimee | alo: 0i). (5.178) 
i=l n=1 


We can model this as a complex Gaussian distributed channel vector if Npath 
is large, but it will not result in what we previously called i.i.d. Rayleigh fading. 
To demonstrate this, suppose the phase-shifts Yin ~ U|—7, 7) are independent 
and uniformly distributed random variables and that the channel attenuations 
Qi n Within any cluster 7 are independent and identically distributed random 
variables with an average channel gain denoted by 


pon Bi 
Eia n] = ——. 5.179 
{ain } Npath ( ) 
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This implies that yee E{a;,,} = 6; irrespectively of the number of paths, 
thus, the parameter 3; € [0,1] determines the average channel gain of the 
entire cluster i. It follows from the central limit theorem in Lemma 2.6 that 


Npath 


>) ene S NO 6) (5.180) 


n=1 


in probability as Npath —> co. Note that the normalization by Npath in (5.179) 
is essential to keep the variance 3; constant as the number of paths increases; 
otherwise, the summation would diverge instead of converge to a Gaussian 
distribution. However, this is only a mathematical technicality because we 
recall from Figure 5.4 that the convergence to the Gaussian distribution is 
Sere a for Npath > 5 if all the attenuations are equally large. 

We let ci Nc(0, 8i) denote the fading variables in (5.180) for i = 
1,..., Nei, ae notice that these are independent random variables. When 
there are many paths per cluster, we can therefore rewrite (5.178) as 


Nei 
h = X ca(yi, bi). (5.181) 
i=1 


We call this scenario clustered rich multipath propagation where the word 
“rich” signifies that there are many paths, so we get Rayleigh fading. However, 
these paths are not isotropically distributed over the angular domain but are 
confined to a limited number of multipath clusters with distinct angles. 

The channel response h in (5.181) is a linear combination of the Ne; array 
response vectors a(;,9;) using the coefficients c; that are complex Gaussian 
random distributed. Hence, the channel has the random distribution 


h ~ Nc(0, Ra), (5.182) 


where the covariance matrix can be computed as 


Na Na 
Ry, =E{hh"} =) DE {cich} alpi 91)a" (Gn, 0 )= Se a (Pi, Bs) a" (Pi, 8). 
i a (5.183) 


The last step in (5.183) follows from utilizing that E{c;c*} = 0 when i # n 
since the variables are independent and have zero means. We will refer to Rp 
as the spatial correlation matrix since it describes how the channel coefficients 
at different spatial locations (i.e., antenna locations) are correlated. 
Clustered rich multipath propagation gives rise to spatially correlated 
Rayleigh fading since each entry of h has a magnitude that is Rayleigh 
distributed, but the entries are statistically correlated. Practical channels 
generally feature spatially correlated fading. The level of correlation can vary 
depending on the number of multipath clusters and their angular locations. 
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One way to measure the level of correlation is by inspecting the eigenvalue 
spread of R}; a large spread represents a high correlation, while a small spread 
represents a low correlation. Note that all the eigenvalues are equal when 
considering i.i.d. Rayleigh fading with Ra = Im. The MISO channel model 
n (5.181)—(5.183) can also be utilized for SIMO channels by interchanging 
the roles of the transmitter and receiver. 

Instead of having a single angle pair (pi, 0i) per cluster, we can associate 
each one with a limited but continuous range of angles. For example, a 
typical cluster is an object that the antenna array observes over a limited but 
continuous range of angles. It then makes sense to replace the summation in 
(5.183) with an integral expression. This can be done as 


T T/2 
Ra =£ J / fo olo, 8)a(p, A)a"(y,0)0000, (5-184) 
-rT —1/2 


where fo,o(y,@) is the angular density function of the multipath components 
and £ represents the average channel gain over all clusters. The former function 
describes how the multipath components are distributed over the angular 
domain. The function is normalized such that ["_ fZ T is fo o(p, 0)00ðp = 1. 
The covariance model in (5.184) is a continuous generalization of the discrete 
model in (5.183) because we can obtain the latter one by selecting 8 = AS 6; 
and fo,e(y, 6) = peak f(y - yi)d(@ — 6;). Even if there is a finite number 
of clusters, the continuous model is more realistic since the abrupt Dirac delta 
function 6(-) can be replaced with something smoother. 


Example 5.17. Suppose there is only one multipath cluster centered around 
(41,01), but it spans a horizontal angular window of length 2A, and a vertical 
angular window of length 2A. What will be the spatial correlation matrix 
Ry» if the multipath components are uniformly distributed? 

Under these conditions, the angular density function should be constant 
over the specified intervals. This is achieved by 


esol Aol Ag 
fo,o(¥,9) = ~ ann ea (5.185) 


0, otherwise. 


By substituting this into (5.184), we obtain the spatial correlation matrix 


gitAy eae 
, 9000p. 5.186 
Reman |, fn, POOO (5.186) 
This model originates from , and is known as the one-ring model since it 
appears in the hypothetical scenario where all the multipath components are 
on a ring-shaped object around the single-antenna device. 
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Figure 5.29: Illustration of a sparse multipath propagation environment where only signal 
components that depart the transmitter in some distinct angular directions (y¢,i, 0t,i) will reach 
the receiver, and only arrive from some distinct angular directions (y,,;, 6r,;). The figure shows 
Nea such cluster directions. 


We can extend the propagation model to capture a point-to-point MIMO 
channel with Na clusters between the transmitter and receiver. Cluster i is 
located in the direction (Y¢,;, 0t,i) seen from the transmitter and in the direction 
(Pri, Ori) seen from the receiver. This setup is illustrated in Figure 5.29. Let 
ax(y,0) € C* denote the array response vector of the ULA at the transmitter 
and ay(y,@) € C™ denote the array response vector of the ULA at the 
receiver, which can both be modeled as in (4.120). If isotropic antennas are 
utilized, then the channel matrix H € C“** can be expressed as 


Nei 
H = X ciam (Pri ri) aK (Pryis t,i), (5.187) 

i=1 
where c; ~ Nc(0, 6i) is an independent random variable that models the rich 
multipath within the ith cluster. The channel matrix consists of a superposition 
of Na components, each determined by the angular directions of the multipath 
cluster via the array response vectors, and having the average channel gain 
bi € [0,1]. The rank of this channel matrix is determined by the number of 
antennas, the number of multipath clusters, and the angular locations of these 
clusters. If the multipath clusters are well separated in the angular domain, 
the summation of Na paths in (5.187) implies that the rank could be Na. 
However, the rank is also upper bounded by min( M, K) since H only has that 
many singular values. Hence, the maximum channel rank is min(M, K, Na). 
Suppose the transmitter uses directive antennas with the gain function 
Gi(y,) and the receiver uses directive antennas with the gain function 
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G,(y,9), where the angles are defined as for the respective array response 
vectors. In line with (4.148), we can extend the channel model in (5.187) as 


H = Sela (pr, is Oti )G ACC is 6, )am (Yr, is Oy ia (Yt, is Or a (5.188) 


Since the multipath clusters are located in different directions, they will be 
associated with different gains Gi (y+, 0¢,;)Gr(Yr,i, Ori). However, the variance 
Bi was already assumed to be cluster-specific, so we can simplify the notation 
by absorbing the antenna gains into these variables. Hence, we can define 


Ci = Ciy) Gel Pri Oca) Ge(Pr,is Ori) ~ Ne (0, 3) (5.189) 


where 3; = BiG (Pt, i, 9,1) Gr(Yr,i, Or,;) and then use the original channel model 
in (5.187) with g; instead of c;. 


Example 5.18. How does the Rician fading distribution extend to the clustered 
rich multipath propagation scenario? 

Rician fading was introduced in Example 5.2 to model channels where 
there exists an LOS path in addition to the many NLOS paths. The LOS 
path has a particular set of departure angles (y;,0, A,9) at the transmitter 
and arrival angles (Yr,9,r,9) at the receiver. This is not a cluster since there 
is only one path. If we add this path to (5.187), we obtain 


Nei 
H = aoea m (Qro, 0r 0)ak (Yt,0; 6,0) IF AD ciam (Yri, br i)a k (Pts bri), 
i=1 
(5.190) 
where ao € [0,1] models the attenuation and Yọ ~ U[—7,7) models the 
phase-shift of the LOS path. Each entry hm,, of this matrix has a magnitude 
with the Rician distribution, thereof the name Rician o 
When using this model, it is common to let 8 = E{|hm, k|? } = a2 +2! G; 
denote the average gain of each channel E TR One can then define the 
so-called K-factor ini or the average gain is divided between the 
LOS and NLOS paths: x = aĝ / Da j Fi. Using this notation, we can generate 
random MIMO channel realizations as 


VBE Ean ($2.0, 9,0) AK (t,o, 9.0) 


+S Noo , bi) aM (Yr, i, Or wag (Yt, is Or ak (5.191) 


where the phase of the LOS path and the Rayleigh fading of each cluster are 
the sources of randomness. 
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5.6.2 Beamspace Representation 


Propagation channels can be sparse in the angular domain in the sense that 
there is a small number of multipath clusters. Such sparsity is particularly 
evident when communicating in the mmWave and THz bands because the 
wireless signals then interact less favorably with objects in the propagation 
environment. One example is provided in Figure 5.30, where the signal with the 
higher frequency is more damped when transmitted through a blocking object 
(e.g., propagating through a wall). Although the world between the transmitter 
and receiver is the same regardless of the signal frequency, the number of 
impactful multipath components can change drastically. The strength of 
the LOS and specularly reflected paths are almost wavelength-independent, 
while paths that interact with multiple objects virtually disappear at higher 
frequencies—thereby leaving only a few dominant paths. The resulting angular 
sparsity is not visible in the MISO channel vector h in (5.181) where all the 
entries have roughly the same magnitude because each antenna reaches the 
receiver with nearly the same power. However, the sparsity can be extracted 
by transforming the channel vector from the antenna domain (where each 
entry represents a physical antenna) to the angular domain (where each 
entry represents an angular interval). The angular domain representation is 
nowadays known as the beamspace [72], but was initially called the virtual 
channel representation [73] and has also been named the Weichselberger model 
due to Weichselberger’s seminal work [74] that generalizes the model and 
highlights its connections to beamforming and multiplexing. 

We recall from Section 4.3.3 that the columns of the DFT matrix Fm in 
(2.198) generate a grid of orthogonal beams, which spans all angular directions 
when using a half-wavelength-spaced ULA with the aperture length D = M à, 
Hence, it acts as an orthogonal basis for the channel, where each basis vector 
represents a specific angular interval. We can denote the nth column of F y as 


1 
ein(n-l fr 


1 ~jr2(n—1)2 
fun = ae eee 
e7it(M-1)(n-1) f 
: 2(n—1 J 
£ Fam arcsin 2-1) ,0), = Lal +1, (5.192) 
. 2(n—1 i 
Jr®™M arcsin 2070 — 2),0), n= |) +2,...,M, 


where am (49,0) € C™ denotes the array response vector in (4.120) for A = \/2. 
We can express this relation in short form as 


1 2(n—1 
frn = m" (srs (PE) o) , n=1,...,M, (5.193) 
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Higher frequency: Larger amplitude loss 


Figure 5.30: Signals with a higher frequency are subject to a larger amplitude loss when 
transmitted through a blocking object. 


where [-]_,., wraps the argument within the range (—1, 1] and is defined as 


x a<l 
a= l aa 5.194 
[æ] 1:1 a a > 1, ( ) 


If we multiply the channel vector in (5.181) with the conjugate transpose of 
the DFT matrix, we obtain the channel’s beamspace representation 


Na fir, 1a(pi, 91) 
h=Fih= Yc; ; (5.195) 


=) [fir malpi 9) 


where each cluster only contributes to one or a few of the entries. 

Figure 5.31 illustrates this transformation in a scenario with M = 10 
antennas and Na = 4 multipath clusters, each located in one of the DFT 
beam directions. The DFT beams are numbered from 1 to 10 and are the 
same as in Figure 4.19(a). Each beam is associated with the interval where 
it provides the largest beamforming gain. We recall that the main beams 
partially overlap, so this is an approximate division of the angular domain. As 
the DFT beams in (5.193) have equally-spaced sine values of their azimuth 
angles, they represent equally-spaced spatial frequencies. Beam n is centered 
around the spatial frequency [2(n —1)/M]_,., /A and covers the interval 
between [(2(n — 1) — 1) /M]_,., /A and [(2(n — 1) +1) /M]_,., /A, which has 
the width iH E 4 that is inversely proportional to the aperture length D. 
Consequently, the angular spacing is wider in the end-fire directions (i.e., 
+7/2) than in the broadside direction. In the propagation scenario illustrated 
in Figure 5.31, the beamspace representation h of the channel has Na = 4 
non-zero entries illustrated by the colored boxes, each associated with one of 
the four clusters. The other six entries of the vector are zero (white boxes). 
This illustrates how the angular sparsity created by the small number of 
clusters can be exposed by transforming the channel vector to the beamspace. 
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Figure 5.31: A transmitter equipped with a half-wavelength-spaced ULA with M = 10 antennas 
communicates with a single-antenna receiver. The angular domain is divided into M intervals 


that match the DFT beams in Figure 4.19(a). Each covers an interval of length 2/(MA) in 


terms of spatial frequencies. The beamspace representation h= Fuh of the channel vector has 


Na = 4 non-zero entries, each generated by a multipath cluster located in an angular direction 
that coincides with one of the DFT beams. Note that the size of the transmitter is exaggerated 
compared to the propagation distances in this figure. 


To shed further light on the beamspace representation, we return to the 
NLOS example in Figure 5.28, where the channel contains four paths with 
distinctly different angles. If we transform this channel to the beamspace, 
we obtain 10 channel components, one per DFT beam direction. Figure 5.32 
shows the relative strength of these channel components (in the decibel scale). 
There are six large and four small components; thus, angular sparsity is also 
prevalent in this scenario. However, the path directions do not precisely match 
the directions of the DFT beams, which is why several paths are smeared out 
over multiple beam directions. Moreover, none of the channel components are 
precisely zero. This is not because there are weak signals arriving from all 
directions but due to the side-lobes that show up almost everywhere; when 
we look for signals in a particular beam direction, we can pick up signals from 
a very different direction through a side-lobe. The behavior in this example 
represents what we will experience in practical scenarios with angular sparsity. 
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Figure 5.32: The strength of the 10 channel components when the NLOS channel from 
Figure 5.28 is transformed to the beamspace. The channel consists of four propagation paths but 
has six large channel components since some paths are smeared out over multiple DFT beams. 


Example 5.19. Suppose there are Na = M clusters that are equally spaced 
in terms of spatial frequency, such that a(p;, 0i) = VMfm,i for i = 1,..., M. 
What is the spatial correlation matrix R}? 

Under the given assumptions, we can express the spatial correlation matrix 
in (5.183) as 


Nei M 
Ra = X bialyi, bija” (yi, 6;) = 5 PiMfm,ifM,i — FyBFiy, (5.196) 
=l i=l 


where B = diag(M 81,..., M8m). The last expression is the eigendecomposi- 
tion of the spatial correlation matrix; thus, the columns of the DFT matrix 
are the eigenvectors and each associated eigenvalue M p; is the total average 
channel gain from the cluster 7 to all the antennas. 


When using two half-wavelength-spaced ULAs, the MIMO channel matrix 
in (5.187) can also be transformed to the beamspace by multiplying by DFT 
matrices of matching dimensions from the left and the right: 


Nei 
H = Fi, HF, = X cx 
i=1 


fir 1AM (Pri ria (Pti Mi )fey o Ei am (Pri Oriak (Pris i) fq x 


fir mam (Pr, i Oriak (Pri; a) fier o fM, am (Pris aK (Peis FE K 
(5.197) 
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Figure 5.33: A transmitter equipped with a half-wavelength-spaced ULA with K = 5 antennas 
communicates with a receiver with a half-wavelength-spaced ULA with M = 10 antennas. The 
beamspace representation H= Fur HF% of the channel matrix has Na; = 6 non-zero entries, 
each generated by a multipath cluster located in an angular direction that coincides with one 
of the DFT beams at each side. The rank of the channel matrix is 4, which is the number of 
linearly independent columns in H. Note that the transmitter and receiver sizes are exaggerated 
compared to the propagation distances. 


Each column of the transformed matrix represents a viable angular transmis- 
sion direction seen from the transmitter, while each row represents a viable 
angular reception direction. Each of the Na multipath clusters will appear 
in one matrix entry or possibly a few neighboring entries. Figure 5.33 shows 
an example where a transmitter with K = 5 antennas communicates with a 
receiver with M = 10 antennas. There are Na = 6 clusters that connect the 
transmitter and receiver, and the non-zero entries of H are illustrated with col- 
oring in the figure. The rank r of the channel matrix is essential to determine 
how many data streams can be spatially multiplexed over the channel. As the 
multiplication with unitary matrices (e.g., DFT matrices) does not change the 
rank, we can utilize the beamspace representation when determining the rank. 
The rank is the maximum number of linearly independent columns (or rows) 
of the matrix. In this example, there are four linearly independent columns 
and one empty column; thus, the rank and multiplexing gain are r = 4. 


The channel rank is determined by how many dimensions the transmitter 
can reach the receiver through (i.e., the number of non-zero columns of H) 
and how many dimensions the receiver can hear the transmitter through (i.e., 
the number of non-zero rows of H). Figure 5.34 shows three examples of such 
beamspace matrices. Case (a) is a rich multipath environment with clusters 
in all directions, resulting in all entries of H being non-zero. This channel 
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Figure 5.34: Three examples of MIMO channels represented in the beamspace. There are 
different numbers of multipath clusters in these examples. The clusters are distributed differently 
over the DFT beams, resulting in channels with different ranks and numbers of random 
coefficients. There are K = 5 transmit antennas and M = 5 receive antennas in all the examples. 
The white entries of the matrix Ñ are approximately zero. 


has the full rank. The MK sources of randomness make it likely that all the 
singular values are of comparable sizes, making this channel well-suited for 
spatial multiplexing. Case (b) has many clusters around the transmitter, but 
the receiver observes all of them through a single DFT beam. This could 
happen when a transmitting user device is surrounded by scattering objects 
while the receiving base station is elevated far above them and sees them all 
from roughly the same direction. The rank of H is 1, so this MIMO channel 
only provides beamforming gains. Case (c) has a small number of clusters, but 
these have well-separated angles that make H diagonal, so the channel has 
full rank. Since there are fewer sources of randomness than in Case (a), there 
might be significant variations in the singular values. This setup resembles 
the MIMO channel illustrated in Figure 3.16, where each singular value is 
associated with a single cluster. 


The rank of a MIMO channel is fundamentally limited by the aperture 
lengths at the transmitter and receiver, along with the sizes and locations 
of the multipath clusters. The DFT beams are equally spaced in the spatial 
frequency domain from —1/A to 1/A. When the ULAs are half-wavelength- 
spaced, each beam at the transmitter covers a spatial frequency interval of 
length = while each beam at the receiver covers an interval of length ae The 
rank is determined by how many such intervals are covered by the multipath 
clusters, which can be interpreted as the spatial bandwidth of the channel. 

Suppose the transmitter is connected to the receiver through N4, multi- 
path clusters visible at the transmitter in non-overlapping angular directions. 
We will let the clusters have arbitrary sizes; thus, integrals are required to 
obtain the resulting channel covariance matrices. In particular, we let cluster 
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n extend from the angle-of-departure p'at to yf" so that it covers a range 
of spatial frequencies of length 


o _ |sin(etid) — sin(a) 
t,n 


A 
The total range of spatial frequencies that the multipath clusters cover is 
yana Qt n € [0,2/A], which represents the channel’s spatial bandwidth from 
the transmitter perspective. If the spatial bandwidth is divided equally between 
the beamspace dimensions at the transmitter, the number of non-zero columns 
in H will be approximately 


(5.198) 


ee Q a cl Nt, cl 
l A r È Men = Di DO Man (5.199) 
KX 


where Dp = KA = kA denotes the aperture length of the transmitter. We 
divided by 2/(KA) since this is the spatial frequency interval represented by 
each column. The maximum value in (5.199) is D, = K, which equals the 
number of transmit antennas and, thereby, the number of columns of H.!° 

Similarly, suppose the receiver is connected to the transmitter through 
N, cı multipath clusters visible at the receiver in non-overlapping angular 
directions. Cluster n extends from the angle-of-arrival ptit to pend so that 
it covers a range of spatial frequencies of length 


(vend) — sin(yf'a")| 


A 


The total range of spatial frequencies excited by the multipath clusters is 
Dina Qn € [0,2/A], which represents the channel’s spatial bandwidth from 
the receiver perspective. If the spatial bandwidth is divided equally between 
the beamspace dimensions at the receiver, the number of non-zero rows in H 
will be approximately 


|sin( p 


Qr = (5.200) 


Ny,cl 


ex Nr cl 
dun= So = SS On = D 22 Mm (5.201) 
Tix 


where D, = MA = Ma denotes the aperture length of the receiver. 

These principles are illustrated in Figure 5.35 for a setup with Nia = 
Nr a = 4 visible multipath clusters at the transmitter and receiver. The widths 
of the colored intervals at the transmitter and receiver show the ranges of 
spatial frequencies Q4 n and Q, n that the respective multipath clusters are 
covering, for n = 1,...,4. The same cluster can cover a vastly different spatial 
frequency range at the transmitter compared to the receiver, depending on its 


13The maximum value is, for example, achieved when there is a single cluster covering all 
angles from yf*}"* = —7/2 to pend = 7/2 so that N41 = |sin(pent) — sin(ystart)| /A = 2/2. 
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Figure 5.35: The rank of a MIMO channel matrix with half-wavelength-spaced ULAs is 
approximately determined by (5.202), which depends on the aperture lengths at the transmitter 
and receiver, as well as the widths of the spatial frequency ranges Qt,n and Qr,n that the 
multipath clusters are covering at the transmitter and receiver, respectively. 


distance and orientation with regards to each of them. As the channel rank is 
the minimum number of non-zero columns and rows, we obtain 


Ns c Nrc 
rank(H) = rank(H) ~ min | D, 5 Oe; Dr >: Oon f (5.202) 
n=1 n=1 


which is an approximation because the rank must be integer-valued and the 
edges of the clusters might be divided between multiple matrix entries, which 
could slightly increase the rank. On the other hand, the rank only specifies 
the number of non-zero singular values of the channel matrix but does not 
guarantee that they are of comparable size. The latter depends on the relative 
strengths of the signals traveling through the respective multipath clusters. 
The MIMO rank analysis can be extended to half-wavelength-spaced 
UPAs, which can resolve spatial frequencies both horizontally and vertically, 
as discussed in Section 4.5.3. Previously in this section, we noticed that a 
half-wavelength-spaced ULA with the aperture length D = M à can achieve 
a maximum rank of M = D2, where 2/A is the maximum range of spatial 
frequencies in one dimension. The UPA has a horizontal aperture length Muz 
that can be used to resolve horizontal spatial frequencies and a vertical aperture 
length My + that can be used to resolve vertical spatial frequencies. By 
decoupling these dimensions, one might expect from the previous analysis that 
the maximum channel rank is My My = Area - 5, where Area = My à Mvà 
denotes the UPA’s aperture area. However, this is incorrect because only 
some combinations of horizontal and vertical frequencies can coexist. The 
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possible combinations lie within the circle with diameter 2/ illustrated in 
Figure 4.42, while the incorrect decoupling argument above considered all 
combinations in a square with side length 2/A. The relative area difference 
between these geometrical shapes is 7(1/A)?/(2/\)? = 7/4. Consequently, the 
actual maximum MIMO channel rank that the UPA can support is Area - x5 
when Area m? is the aperture area. In other words, each segment with area À? 
can add (approximately) m to the channel rank. If two such half-wavelength- 
spaced UPAs are placed in a realistic environment, the MIMO channel rank 
is determined by how many spatial frequencies are excited by the multipath 
clusters; that is, which fractions of the circle with spatial frequencies in 
Figure 4.42 are excited. The rank becomes approximately Area - min(, Qr), 
where (4,9, € [0, ~] denote the total areas of the parts of the circle that 
are excited at the transmitter and receiver, respectively. A precise derivation 
requires more extensive mathematical notation, so we refer to [75|-|77] for such 
details, which can also be used to analyze the MIMO channel rank arbitrarily 
shaped antenna arrays. 

The beamspace representation for a half-wavelength-spaced 9 x 9 UPA 
is illustrated in Figure 5.36. We begin by revisiting the NLOS setup from 
Figure 5.28 with four propagation paths having distinct azimuth and elevation 
angles. Figure 5.36(a) shows the real part of the wave impinging on the array. 
The UPA samples the wave at the marked antenna locations. The resulting 
channel can be turned into the beamspace by taking a 2D-DFT of the channel 
coefficients, which results in the 2D spatial frequency spectrum shown in 
Figure 5.36(c). There are four peaks, which match the number of paths. It 
was implicitly assumed in this example that the paths gave rise to plane 
waves. However, the beamspace analysis can be applied regardless of the 
wavefront. Figure 5.36(b) shows the real part of the impinging wave emitted 
from a transmitter at a short distance of 8A in the broadside direction. There 
are large circle-shaped phase variations over the array, which is typical for 
spherical waves. When turning this near-field channel into the beamspace, 
we obtain the 2D spatial frequency spectrum shown in Figure 5.36(d). The 
spectrum contains a range of spatial frequencies centered around zero, while a 
plane wave would only contain the zero-valued frequency. Since the 2D-IDFT 
recreates the channel using the discrete spatial frequencies shown in the figure, 
a spherical wave can be represented as a summation of multiple plane waves. 
A formal connection can be made using the Weyl identity [78]. The bottom 
line is that any channel matrix can be represented in the beamspace. The 
maximum channel rank result holds even in the presence of spherical waves, 
which are summations of many plane waves. 
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(c) 2D-DFT of the channel from (a). (d) 2D-DFT of the channel from (b). 


Figure 5.36: The wave that impinges on a UPA can have many different shapes. The real parts 
of two waves are shown in (a) and (b) for different kinds of propagation channels. The observed 
horizontal and vertical spatial frequencies differ for these channels. When the UPA has 9 x 9 
half-wavelength-spaced antennas, the spatial frequencies shown in (c) and (d) are observable. 
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Example 5.20. How does the wavelength impact the rank of the channel 
matrix if the multipath clusters remain the same? 
The rank expression in (5.202) for a ULA can be expressed as 


ieee cl 
min x pal Gr) = sin( (a) 


a) start 
; 2S: sin On )— sin(yi n )| è 


(5.203) 
which depends on the wavelength A through the normalized aperture lengths 
Dy = Be and Dyr = Be of the transmitter and receiver, respectively. If we 
keep the aperture lengths D, and D, constant (i.e., constant array sizes in 
meters), the rank is inversely proportional to the wavelength. Hence, we obtain 
a larger channel rank as the wavelength shrinks (e.g., from the low-band to 
the mmWave band). To achieve this, we need a larger number of antennas 
since the antenna spacing /2 is also wavelength-dependent, which explains 
why the spatial resolution improves so that the same clusters generate more 
channel dimensions. If we instead keep the normalized aperture lengths D) 4 
and D), fixed, then the rank becomes independent of the wavelength. This 
is achieved by using a fixed number of half-wavelength-spaced antennas. 


An i.i.d. Rayleigh fading channel matrix obtained using half-wavelength- 
spaced ULAs can also be transformed to the beamspace. The entries of H will 
remain independent and identically distributed because the multiplications 
with the unitary DFT matrices in (5.197) do not change the distribution. This 
demonstrates how the multipath components are uniformly distributed over 
all angular directions instead of clustered. The maximum diversity order of a 
MIMO channel equals the number of distinguishable sources of independent 
randomness, which is MK under i.id. Rayleigh fading and (approximately) 
equal to the number of non-zero entries of H under clustered scattering. In 
Figure 5.34, Case (a) can represent iid. Rayleigh fading if all the entries are 
identically distributed. In this case, the maximum diversity order is MK = 25. 
The channel matrices in Case (b) and Case (c) have 5 non-zero entries; thus, 
the maximum diversity order is 5. We need to use transmit diversity in 
Case (b) since the sources of randomness are only distinguishable from the 
transmitter’s viewpoint. In contrast, it is sufficient to exploit receive diversity 
in Case (c), while the transmitter can send the same signal in each DFT 
beam direction. The diversity order generally equals the number of (large) 
non-zero entries of the beamspace channel matrix. The diversity gain increases 
with the number of antennas (with A = A/2) since the improved spatial 
resolution will divide a multipath cluster between multiple matrix entries 
in the beamspace representation, thereby creating additional distinguishable 
sources of randomness. 
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Figure 5.37: The ergodic rate as a function of the SNR with either i.i.d. Rayleigh fading or 
correlated fading, where the beamspace representation of the channel matrix only contains a 
non-zero block of size 2 x 2 (i.e., the rank is 2). The correlation reduces the multiplexing gain 
compared to i.i.d. Rayleigh fading, but a beamforming gain is still achieved. 


The capacity with clustered scattering can be evaluated similarly to the 
cases with i.i.d. Rayleigh fading. In particular, the ergodic capacity expression 
in (5.129) can be applied when the receiver has perfect CSI while the transmit- 
ter has no CSI. However, the optimal covariance matrix R, of the transmitted 
signal will not be a scaled identity matrix but will depend on the clusters seen 
from the transmitter. For example, in a MISO scenario in which the channel 
distribution h ~ Nc(0, Rp) is given in (5.182), it can be shown that the 
optimal covariance matrix R, has the same eigenvectors as R, and allocates 
power between these dimensions based on how large their eigenvalues are. We 
refer to [79], [80] for further the precise details. When the transmitter also has 
perfect CSI, we can compute the maximum rate for a single channel realization 
using Theorem 3.1 (i.e., transmitting in the directions of the right singular 
vectors and using the water-filling power allocation) and then compute the 
mean value over the fading channel. 

Figure 5.37 shows the ergodic capacity achieved with M = K = 4 an- 
tennas when the transmitter and receiver have perfect CSI. A channel with 
ii.d. Rayleigh fading is compared with a correlated channel where H contains 
a 2 x 2 non-zero submatrix, representing multipath clusters that cover half of 
the DFT beams. The channel matrix in the correlated scenario has the rank 
r = 2 while the rank is r = 4 with iid. Rayleigh fading, which implies that the 
largest singular value is generally larger under correlated fading. This results 
in a slightly higher capacity at low SNRs, but the benefit is lost at higher 
SNRs where the larger multiplexing gain leads to a faster capacity growth 
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under i.i.d. Rayleigh fading. The capacity with M = K = 2 and i.i.d. Rayleigh 
fading is shown as a reference. It achieves the same multiplexing gain as in 
the correlated scenario, but the capacity curve is shifted to the right by 6dB 
since the beamforming gain is 4 times smaller. In general, clustered multipath 
propagation has a detrimental impact on the ergodic capacity compared to 
iid. fading, but spatial correlation is a naturally occurring characteristic that 
must be considered when modeling practical channels. 


5.6.3 Fading Channels with Dual-Polarized Antennas 


The iid. Rayleigh fading model was derived earlier in this chapter under the 
implicit assumption of using single-polarized antenna arrays. That kind of 
fading cannot be achieved with dual-polarized antennas because the channel 
between a transmit antenna and a receive antenna is generally stronger if 
their polarization matches compared to if they have opposite polarizations; 
hence, the fading is not identically distributed. The channel model derived in 
Section 4.6.3 for LOS MIMO channels with dual-polarized antennas can be 
readily generalized for NLOS channels with clustered multipaths. That model 
was constructed for a scenario with K/2 dual-polarized transmit antennas and 
M/2 dual-polarized receive antennas. The antennas were numbered so that 
transmit antennas 1,..., A/2 and receive antennas 1,...,M/2 have matching 
polarization, while the same holds for transmit antennas K/2+1,..., K and 
receive antennas M/2+1,...,M. 

Suppose there are Na multipath clusters between the transmitter and 
receiver, of which cluster i is located in the direction (y¢,;,,;) seen from 
the transmitter and in the direction (Yri, r,;) seen from the receiver. This 
is the same notation as in (5.187). The channel component through each 
cluster can be modeled similarly to (4.176) but with additional Rayleigh 
fading coefficients that model the random amplitudes and phases. We obtain 
the channel matrix 


Nei 
c = KCi,1,1 VKCi,1,2 T . 
a | VKGi21 V1—KCi,22 = (amri Ocak aleui O.2)) 
(5.204) 


i=1 


where Cj,1,1, Ci,1,2; Ci,2,1; Ci,2,2 ~ Nc(0, 8i) are independent Rayleigh fading 
coefficients and £; € [0,1] models the average channel gain of cluster i. The 
channel has a limited XPD in the sense that there is leakage between the or- 
thogonal polarizations characterized by the parameter «. The diagonal entries 
V1— K¢i1,1, V1 — Kci 2,2 characterize the signal propagation that maintains 
its polarization, while the off-diagonal entries \/Kcj,1,2, /KCi,2,1 characterize 
the leakage between the polarizations. We recall that « = 0 represents perfect 
discrimination, while « = 0.5 is the worst-case situation. In the LOS scenario 
considered in Section 4.6.3, the limited XPD was caused by imperfect isolation 
within the transmitter and receiver hardware. The situation is different in 
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NLOS scenarios. Each reflection/scattering in a multipath environment can 
shift the wave’s polarization, which creates further leakage that is factored 
into the parameter « in the considered model; that is, « is likely larger in 
NLOS setups than in LOS setups. 

The primary implication of adding a second polarization is that it provides 
extra sources of randomness since the two polarizations fade independently, 
even if the XPD creates an additional kind of spatial correlation.'* Hence, 
dual polarization can double the channel matrix’s rank in a propagation 
environment with few multipath clusters. It can also double the diversity 
order (or even quadruple it, thanks to the imperfect XPD). Polarization has 
been utilized since the early days of mobile communications [56], [58] to 
enhance performance and reliability. 

We have only considered single- and dual-polarized antennas so far. As 
wireless signals propagate in three dimensions, up to three mutually orthogonal 
linear polarization dimensions exist, which represent the x, y, and z axes ina 
coordinate system. We can only utilize two of these in LOS communications 
because the polarization must be perpendicular to the direction the waves 
propagate to the receiver. However, NLOS channels allow waves to follow 
widely different paths from the transmitter to the receiver. The signal can leave 
the transmitter in any direction and reach the receiver from any direction, 
and objects in the environment might allow a signal with any polarization to 
(partially) maintain that polarization when reaching the receiver. It is possible 
to build tri-polarized antennas, for example, with one antenna pointing along 
each axis in the coordinate system. The channel measurements reported in 
[82], [83] confirm the viability of building tri-polarized MIMO communication 
systems. However, the improvement in data rate is slight compared to having 
optimally rotated dual-polarized antennas. Hence, the main benefit is that 
one can keep a consistent rate regardless of how the user device is rotated, 
while the challenge is to integrate tri-polarized antennas into the form factor 
of a base station and device. 


14Many measurements show that there exists a slight correlation between the fading realiza- 
tions ¢j,1,1, Ci,1,2, Ci,2,1, Ci,2,2, Which is not captured by the model used in this chapter. There 
can also exist imbalances between the variances of the two polarization dimensions since the 
multipath distributions generally differ horizontally and vertically. The latter effect can be 
reduced by using slanted polarization [56], [57]. We refer to [81] for further details and ways to 
enrich the MIMO channel model to include such characteristics. 
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5.7 Exercises 


Exercise 5.1. We used the central limit theorem to motivate that a SISO system has 
the channel response h ~ Nc(0, £) in a rich multipath environment. In this exercise, we 
will revisit the technical conditions that underpin that h = pan ae J”? — Ne(0, B) 
when the number of paths L is very large. We assume that the path attenuations a; are 
independent and identically distributed, the phases 7); ~ U|—7, 7) are independent, and 
8 > 0 is a constant. 


(a) Suppose the path gains satisfy E{a?} = 8/L?, for i = 1,...,L. What is the 
variance of h when L —> oo? Is h complex Gaussian distributed? 


(b) Suppose the path gains satisfy E{a?} = 8/L , for i = 1,...,L. What is the 
variance of h when L —> co? Is h complex Gaussian distributed? 


(c) Suppose the path gains satisfy E{a?} = 6, for i = 1,..., L. What is the variance 
of h when L —> co? Is h complex Gaussian distributed? 


Exercise 5.2. The spatial correlation expression in (5.24) is derived for isotropic scatter- 
ing, for which the multipath components are uniformly distributed over the unit sphere 
as stated in (5.17). When other distributions are used, one can compute the spatial 
correlation differently. One such example is the Clarke model, where the joint PDF of 
the azimuth and elevation angles is 


aes. (5.205) 


NIA 
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1 
Fool, 0) = 53: -T LY<T, _ 


(a) For a ULA located along the z-axis with the channel response in (5.15), obtain 
correlation E{hmh7,} between the channel realizations at antenna m and n. 


(b) Express the correlation in (a) using the zeroth-order Bessel function of the first 
kind, defined as 


n/2 —_ 
Jo(a) = + | e ein 856. (5.206) 


—1/2 


Exercise 5.3. Consider the setup in Figure 5.7 for which the SISO channel coefficient is 
given in (5.29) as a function of time: h(t) = 2a cos(274*). What is the coherence time if 
we define it as the time it takes to move from a peak to losing half the received power? 


Exercise 5.4. Consider the SISO channel in (5.35) with slow fading, where y[l] = 
h- al] + n{l]. Suppose the channel coefficient h is a realization of a random variable that 
is zero with probability p and one with probability 1 — p. 


(a) What is the outage probability of this channel? Express the answer as a function 
of the desired rate R. 


(b) Suppose we instead have two receive antennas that observe independent channel 
realizations, each with the mentioned distribution. What is the outage probability 
for the desired rate R? 
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Exercise 5.5. Consider the SISO channel in (5.35) with slow fading, where y[l] = 
h- al] + nl]. The channel coefficient has the uniform distribution h ~ U[—1, 1]. 


(a) Derive the outage probability of this channel. Express the answer as a function of 
the desired rate R > 0. 


(b) Suppose we have M receive antennas and these observe the same channel realization 
h. What is the outage probability in this case? 


(c) Derive expressions for the -outage capacities for the setups in (a) and (b). Sketch 
a graph of the expressions for M = 1, M = 4, and M = 10 with € on the horizontal 
axis and the <-outage capacity on the vertical axis for g/No = 1. 


Exercise 5.6. Consider the SISO channel in (5.35) with slow fading, where y[l] = 
h- x[l] + n[l]. Compute the outage probability and c-outage capacity for the following 
fading distributions. 


a e channel gain as the 
The channel gain |h|? has the PDF 


21-2), 0<a¢<1, 
= aan) 5.207 
Jini (2) o otherwise. ( ) 


(b) The channel gain |h|? has the PDF 


ae OS e <1, 


Finj2 (2) = Pa 


7 (5.208) 
0, otherwise. 


Exercise 5.7. Consider a SIMO system with M antennas. 


(a) The receiver has a hardware limitation that only allows it to use one of its antennas 
at a time, known as antenna selection. Which antenna should the receiver select 
to maximize the rate for given realizations of hi,...,ha¢? Provide an expression 
for the maximum rate. 


(b) Suppose the channel is subject to slow i.i.d. Rayleigh fading and only the receiver 
knows the channel realization. Formulate the outage probability when communi- 
cating at a given rate R. Hint: Use the identity 


Pr {max{|ha|”,...,|Aac|"} < £} = Pr {h| <2,...,|Aac|” <a}. 


(c) Compare the outage probability of the antenna selection scheme with that of 
MRC given in (5.52). Which one is larger? 
(d) Derive the high-SNR slope of the outage probability curve when using antenna 


selection. Compare the results with the high-SNR slope achieved with MRC. Hint: 
Use the approximation e`” ~ 1 — x, which is tight when z is small. 


Exercise 5.8. The diversity order of a channel can be defined as 


; —In(Pout(R)) 

1 — 32 
SNR%  In(SNR) ”’ (a 
where SNR denotes the SNR and the outage probability Pout (R) is a function of the 
SNR for any fixed value of R. Use the exact expression of the outage probability in 
(5.53) for a SIMO channel and verify that the diversity order is M according to this 
definition. Hint: Use the identity e” = XY” z and D’Hôpital’s rule. 


m=0 
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Exercise 5.9. Consider a MISO system with M = 2 antennas and slow fading, where 
the receiver knows the channel but not the transmitter. The channel coefficients are 
distributed as hi, h2 ~ Nc(0, B) but are correlated in the sense that hi = h3 for every 
channel realization. Use the Alamouti code for transmission and compute the outage 
probability, both the exact value and an upper bound that exposes the diversity order. 
Compare the diversity order with what is achieved when hı and h2 are independent. 
Hint: The expression in (5.73) holds for any channel distribution, so the solution can 
start from that point. 


Exercise 5.10. One popular definition of channel hardening is that the fading SIMO/MISO 
channel vector h € C™ must satisfy 


h|? 


———__ 545] as Mow. 5.210 
E( (b> ane 


This means that all realizations of ||h||? will be close to the mean value E{||h||?} when 
there are many antennas. The convergence in (5.210) can be proved in a mean-squared 
sense by computing the variance of li and show that it goes to zero as M — oo. 
Follow this approach to prove that an i.i.d. Rayleigh fading channel provides channel 


hardening. 


Exercise 5.11. Consider a slow-fading SISO channel with a repetition scheme where the 
same signal x ~ Nc(0, q) is transmitted over L time slots. 


(a) Compute the conditional capacity for a given realization of the channel coefficient 
h. 


(b) Compare the capacity obtained in (a) with a conventional slow-fading SISO 
channel without a repetition scheme. What value of L maximizes the capacity? 


Hint: Use the inequality =; <In(1+ 2), for x > 0. 


(c) Obtain a low-SNR approximation of the capacity in (a). How does it depend on 
L? 


Exercise 5.12. Consider MISO and SIMO channels with slow-fading and M transmit 
and receive antennas, respectively, under i.i.d. Rayleigh fading. Only the receiver knows 
the channel realization. Due to hardware limitations, we can only adjust the phase of 
the precoding/combining vectors, so MRT/MRC is not possible. This is called equal 
gain beamforming. 


(a) Consider the MISO case and show that the full transmit diversity order can be 
achieved using a repetition scheme where the same signal is repeated using M 
different orthogonal beams. 


(b) Consider the SIMO case and propose a way to achieve the full receive diversity 
order. 


Exercise 5.13. Consider a MISO channel with slow fading and i.i.d. Rayleigh fading. 
There are M = 2 transmit antennas and the Alamouti code is used. 


(a) Use the low-SNR approximation of the conditional capacity to compute a low-SNR 
approximation of the outage probability expression. 

(b) Compare the result in (a) with the low-SNR approximation of the outage proba- 
bility for the corresponding SISO channel (without the Alamouti code). Which 
one is better in terms of outage performance at low SNR? Hint: Use e” > «+1, 
for x > 0. 
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Exercise 5.14. Consider a SIMO system with M antennas. The symbol power is q and 
the noise power spectral density is No. 


(a) Suppose the channel vector is h = [1,..., 1]*. What is the capacity of the channel? 
What is performance gain compared to a corresponding SISO system with h = 1? 
Exemplify the performance gain at low and high SNRs. What is this kind of 
performance gain called? 


(b) Suppose the channel vector h is instead subject to i.i.d. Rayleigh fading, so that 
h~ Nc(0,I m). Consider a fast-fading scenario where the receiver knows the 
channel but not the transmitter. What is the capacity of the channel? What are 
the performance gains compared to a corresponding fast-fading SISO channel with 
h ~ Nc(0,1)? 


(c) Compare the capacities in (a) and (b). Which one is the largest? What happens 
with the performance difference as M — oo? Hint: Use the channel hardening 
property in (5.210). 


Exercise 5.15. Consider the SISO channel with fast fading in (5.103), where the received 
signal at the time instance l is y[l] = All] - [l] + n[l]. The channel coefficient h[l] is a 
binary random number taking the realization 1 with probability p and 0 with probability 
1 — p. A new independent realization of hA[l] is drawn in each time instance | and the 
receiver knows the realization. 


(a) What is the ergodic capacity of this SISO channel? 


(b) Consider a fast-fading SIMO channel with M antennas and the channel vector 
h{/] = [A[l],..., h[I]] T. This vector takes a new independent realization at every 
time instance, but all the entries in the vector are always mutually identical. What 
is the ergodic capacity of this channel? 


(c) Consider a fast-fading SIMO channel with M antennas. The channel coefficients 
are independent and identically distributed according to the distribution specified 
above. What is the ergodic capacity in this case? Hint: Use that ||h||? is the 
summation of the independent Bernoulli random variables |hm|? and write the 
expectation using the binomial sum formula. 


Exercise 5.16. Consider an i.i.d. Rayleigh fading MIMO channel with fast fading. The 
received signal at the time instance l is y[!] = H[J]x{[l] + n[l], where the noise is colored 
with the distribution n[l] ~ Nc(0, NoC) and C is a non-singular matrix. The channel 
takes an independent realization at each time instance, and only the receiver knows the 
realization. Determine the ergodic capacity. Hint: Use whitening. 


Exercise 5.17. The fast-fading MIMO capacity is expressed in (5.132) in terms of the 
non-zero eigenvalues of HH". Use a high-SNR approximation to express the ergodic 
capacity in terms of mean values involving those eigenvalues. 
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Exercise 5.18. Consider an i.i.d. Rayleigh fading MIMO channel with block fading. We 
must estimate the channel coefficients in each block to perform spatial multiplexing. 
Suppose Le is the length of the coherence block and that M, K are larger than Le. The 
pilot length Lp is a variable that must be smaller than Le. 


(a) Which multiplexing gain can we achieve for a given value of Lp? What is the 
pre-log factor in the ergodic capacity expression, which includes the multiplexing 
gain and the penalty from the pilot transmission? 


(b) Which value of Lp maximizes the pre-log factor in (a)? What is the multiplexing 
gain in this case? 


Exercise 5.19. Consider a MISO channel with Na multipath clusters, which has the 
channel vector 


Noi 
h = 5 cia(yi,6:), (5.211) 
i=l 


where ¢1,...,CN,, are i.i.d. Nc(0, 8/Na) random variables. The transmitter has a ULA 
deployed along the y-axis with A = à/2 antenna spacing. The angles (y:,0:) are 
deterministic and different for every i (i.e., a(y:,6;) is a different vector for every i). 


(a) Determine if this channel provides channel hardening by following the approach 
from Exercise 5.10 for a fixed number Ne < oo of multipath clusters. Hint: Use 
the trace property given in (2.52). 


(b) Determine if this channel provides channel hardening when Na — oo. Hint: It 
holds that Var{||h||?} = tr(R?) if h ~ Nc(0,R) for any covariance matrix R.. 
Utilize beamwidth-like expressions to prove that 7;|a"(¥i, 0:)a(~r,9x)| 4 0 as 
M — co when k and i are different. 


Exercise 5.20. Consider two half-wavelength-spaced ULAs with 4 antennas deployed 
to be parallel. What is the minimum number of multipath clusters needed to achieve a 
full-rank channel matrix if all the paths in a cluster have the same angle? Suggest angle 
values for these clusters to achieve full rank. Hint: Use the DFT matrix. 


Exercise 5.21. A half-wavelength-spaced ULA with M antennas is deployed along the 
y-axis. Consider the one-ring model from Example 5.17 with A» = 0, so that the 
multipath components are uniformly distributed in the azimuth plane in an interval of 
width 2A, > 0. 


(a) Generalize this one-ring model to have Na non-overlapping clusters in the azimuth 
plane. Cluster 7 is centered around the azimuth angle y,, for i = 1,..., Na, and 
has the average channel gain (;. Hence, the angular density function for cluster i 
is 


sx; iflp-—gil<A 
. = 2A,’ 1 p Pil S Pp? 5.212 
flo) r otherwise. ( ) 


Derive an expression for the spatial correlation matrix Rp. 


(b) Use the width of the clusters to approximate the rank of the matrix Rp. 
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Exercise 5.22. The Kronecker model is a classic model for spatially correlated Rayleigh 
fading MIMO channels. The channel matrix is then given as 


H = RPW (RiP), (5.213) 


where the entries of W € C™** are i.i.d. Nc(0, 8) random variables. The normalized 
spatial correlation matrices R, and R, have unit diagonal entries and characterize the 
spatial correlation among the channel realization at the receive and transmit antennas, 
respectively. Let the eigendecomposition of R, and R, be denoted as R, = U,.D,U¥ 
and R; = U,D, UF, respectively. The eigenvalues Ar,1,...,Ar,w and At.i,...,At,K are 
located along the diagonals of D, and D+, respectively, in descending order. 


(a) Compute H = U!FHU; and simplify the expression to show that its entries are 
independently distributed. This is a beamspace matrix similar to H = Fy HF% 
in (5.197). 


(b) How do the variances vary between the entries of H? Can any beamspace matrix 
be expressed using the Kronecker model? 


Exercise 5.23. Consider an antenna array in an isotropic rich scattering propagation 
environment where the multipath components cover all angular dimensions uniformly. 


(a) Consider a 2-antenna ULA with A = 4/2. What is the spatial correlation matrix? 
Hint: Use the correlation expression in (5.24). 


(b) Consider a 2 x 2 UPA with A = X/2 vertical and horizontal spacing. What is the 
spatial correlation matrix? Hint: The coordinate system can be rotated arbitrarily, 
so the expression in (5.23) can be used when considering any pair of antennas. 


(c) For which kinds of arrays will isotropic rich scattering imply i.i.d. fading? 


Chapter 6 


Capacity of Multi-User MIMO Channels 


In this chapter, we will characterize the communication performance over 
multi-user MIMO channels, also known as point-to-multipoint and multipoint- 
to-point MIMO channels. We begin by explaining why the capacity gains 
achieved by point-to-point MIMO are limited in many practical scenarios, as 
a motivation for identifying an alternative way to use multiple antennas. The 
focus will then be on serving multiple users connecting to the same wireless 
system, which raises the question of whether the users should be assigned 
orthogonal or non-orthogonal transmission resources. To answer this, we will 
extend the capacity concept to the multi-user setting and discover how the 
use of multiple antennas radically changes the situation. We will adapt the 
precoding, combining, and power allocation schemes from previous chapters 
to maximize performance in the multi-user context. The tradeoff between 
non-linear and linear signal processing methods will finally be explored. 


6.1 A Practical Issue with Point-to-Point MIMO Systems 


In the previous two chapters, we have derived the capacity of point-to-point 
MIMO channels in both LOS and NLOS scenarios. The largest capacity im- 
provement from having multiple antennas is achieved through the multiplexing 
gain. If we have M receive antennas and K transmit antennas, the capacity 
ideally becomes min( M, K) times larger than in a corresponding SISO channel. 
This can lead to a huge performance improvement if the channel satisfies two 
properties: 


1. The channel matrix H has min( M, K) singular values of similar size; 
2. The SNR is high. 


Unfortunately, these properties seldom appear at the same time in practice. 
When the SNR is large, it is often because the channel contains one dominant 
propagation path (e.g., a LOS path) while the remaining paths are substantially 
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weaker. Hence, H has only one or two large singular values (depending on 
whether single-polarized or dual-polarized antennas are considered), while 
the remaining ones are much smaller and potentially zero-valued regardless 
of how many antennas are deployed. In NLOS scenarios with isotropic rich 
scattering, the channel matrix will instead have full rank, and the singular 
value variations are quite small. However, the SNR is relatively low since 
a large fraction of the transmitted power disappears due to the multipath 
propagation; thus, only beamforming gains might be practically useful. Hence, 
LOS and NLOS propagation typically provide the opposite conditions of what 
is preferable from a theoretical perspective. At low SNRs, we would prefer a 
low-rank channel to make the most out of the beamforming gain, while we 
want a high-rank channel with many similar singular values to make the most 
out of the multiplexing gain at high SNRs. 


Figure 6.1 exemplifies these issues by considering a point-to-point MIMO 
channel with K = 4 transmit antennas and M = 4 receive antennas. We 
compare an NLOS case with i.i.d. Rayleigh fading, where the ergodic capacity 
is computed using (5.131), and a single-polarized LOS case where the capacity 
is computed according to (4.96). The capacity of a non-fading SISO channel 
is shown as a reference. The use of multiple antennas provides the most 
significant capacity gains compared to the SISO case when considering NLOS 
channels with high SNR and LOS channels with low SNR. However, these 
events are unlikely to happen in practice. The figure indicates two more likely 
events: LOS with high SNR and NLOS with low SNR. In both cases, there are 
clear gains compared to the SISO channel, but they are still modest compared 
to what could be achieved in those SNR ranges. 


In summary, an ideal point-to-point MIMO system operates at high SNR 
and achieves a multiplexing gain. However, in practice, we are likely to mainly 
achieve beamforming (and diversity) gains either because the SNR is low or the 
channel has a low rank. Reality can be slightly better than was illustrated in 
Figure 6.1 because LOS channels can contain a few strong reflected paths useful 
for spatial multiplexing, while NLOS channels feature clustered multipath 
propagation where a few directions provide better SNRs. Yet, the nature of 
signal propagation seems to hinder the point-to-point MIMO from reaching 
its “full capacity”, except in short-range NLOS scenarios. 


Another practical issue with having multiple antennas at both the trans- 
mitter and receiver is that one of them is usually a handheld user device. The 
number of antennas that can fit into such a device is limited for aesthetic 
reasons, particularly in the low-band and mid-band spectrum. This obser- 
vation should not be interpreted as beamforming gains being pointless; on 
the contrary, practical cellular networks are designed and deployed to make 
good use of them. Recall from Section 3.1 that point-to-point SISO systems 
can either be power-limited (i.e., operate at low SNR) or bandwidth-limited 
(i.e., operate at high SNR). The capacity of a power-limited SISO system can 
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Figure 6.1: A comparison of the capacities of different MIMO channels with M = K = 4 
antennas. There is i.i.d. Rayleigh fading in the NLOS case, while there is a rank-one channel in 
the LOS case. Since practical channels with high SNRs are often LOS channels, while NLOS 
channels experience low SNRs, the large potential capacity gains of point-to-point MIMO 
channels over SISO channels are hard to achieve in practice. 


be greatly improved by adding antennas to achieve a beamforming gain that 
increases the SNR. This is practically relevant in systems operating over large 
distances (i.e., with a small channel gain per antenna) and/or using large 
bandwidths (e.g., in the high-band spectrum). Beamforming is probably a 
prerequisite for systems operating in mmWave and THz bands because we 
need similar aperture lengths as in the lower bands to achieve similar SNRs, 
which calls for using antenna arrays. However, beamforming gains have a 
limited impact on the bandwidth-limited SISO systems’ capacity; thus, adding 
antenna arrays to such systems might only be worthwhile if we can achieve 
multiplexing gains. 


An early indication of how to achieve multiplexing gains also in LOS 
scenarios was provided in Figure 4.28, where a ULA transmits to a receiver 
equipped with distributed antennas. Strictly speaking, this is not a point-to- 
point MIMO system but a point-to-multipoint MIMO system because the 
receive antennas were located at multiple geographically distributed points. 
While a user device will only exist at one point, deploying base stations at 
different points and letting them cooperate to serve a user is practically viable. 
This is called coordinated multipoint transmission [52] or Cell-free MIMO [2]. 
We refer to those references for further details since this chapter considers a 
different scenario: a base station at one point communicates with multiple 
user devices, each located at a geographically different point. This is known 
as multi-user MIMO communications. 
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6.2 Capacity Definition in Uplink and Downlink 


In the rest of this chapter, we consider a setup where a base station deployed 
at a fixed location serves K user devices. This could represent an entire 
communication system or a single cell in a larger cellular network with many 
base stations that serve different geographical regions (cells). The users can 
move around in the coverage area; thus, the base station must adapt its 
transmissions to the current set of users. There are two relevant directions of 
communication. The transmission from the base station to the user devices 
is called the downlink, inspired by the fact that base stations are typically 
deployed at elevated locations and transmit down toward the users. Similarly, 
the transmission from the user devices to the base station is called the uplink. 
These communication directions are also known as the forward link and reverse 
link, respectively, especially in contexts where the down/up analogy is not 
applicable (e.g., when a ground-based base station serves flying objects). In 
information theory, the downlink of a multi-user system is known as the 
broadcast channel, while the uplink is called the multiple access channel [42]. 
The downlink and uplink are illustrated in Figure 6.2. The base station is 
shown as equipped with an antenna array capable of directing beams toward 
each user device, a feature we will explore later in this chapter. If there are 
NLOS channels to the users, the radiated signals will not look like angular 
beams, as discussed in Section 5.6. The user devices radiate signals (almost) 
isotropically, but only the parts directed toward the base station are indicated. 
Both the downlink and uplink will be analyzed in this chapter. 

The downlink is a point-to-multipoint system where we transmit from one 
point (the base station location) to multiple points (the K user locations). It 
resembles the point-to-multipoint example in Figure 4.28, where a transmitter 
communicated with a receiver equipped with distributed antennas. However, 
two fundamental properties make the downlink setup different from an op- 
erational perspective. Firstly, each user only has access to its own received 
signal and not those at antennas belonging to other user devices. Secondly, 
the users want to access different data and are not interested in the data 
intended for others. Hence, each user measures its performance in terms of its 
individual channel capacity. In a system with K users, there are K different 
capacities to consider. This makes the system design more complicated and 
we will develop a theory in this chapter to manage it. 

Let Rp bit/s be the variable denoting an achievable data rate of user k. 
From previous chapters, we know how to characterize the user’s capacity when 
the user is alone in the system; for example, Theorem 3.1 gives the capacity 
when the base station and user operate as a point-to-point MIMO system. 
We will denote that capacity value by C3" bit/s, where su indicates that this 
is the single-user capacity. Hence, we know that the range of achievable data 
rates for user k is 

0 < Rk < Ch. (6.1) 
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(a) Downlink (or forward link). 


(b) Uplink (or reverse link). 


Figure 6.2: In a cellular network, a fixed-location base station serves mobile user devices in a 
given coverage area called a cell. It can transmit beams toward the users in the downlink and 
receive signals from the users in the uplink, either simultaneously or sequentially. 


It is usually impossible for two users to achieve their single-user capacities 
simultaneously because they share transmission resources, namely the time, 
frequency, transmit power, and spatial dimensions. Hence, a tradeoff exists 
between the performance different users can achieve, which must be modeled 
and dealt with in the system design. We will characterize this tradeoff in 
different scenarios but begin by describing the framework that can quantify 
it: the rate region. This is a set R C R* containing all the combinations of 
rates (Rı,..., Rg) that are simultaneously achievable in a given system (i.e., 
for the given channel conditions and transmission resources). 

Figure 6.3 illustrates a rate region for a setup with K = 2 users, where 
the yellow-shaded region shows all the combinations/tuples of rates (Ri, R2) 
simultaneously achievable. This includes the single-user capacity points (C}", 0) 
and (0, C$"). It also includes many different tradeoffs between these points, 
where one user reduces its capacity to allow the other user to increase its 
capacity. If a point (R1, Rə) is inside the region, then any other point (R{, R4) 
that satisfies 0 < Ri, < Rı and 0 < R% < Rə is also inside the region. The 
intuition is that we can always purposely reduce the users’ data rates and 
still obtain an achievable system operation. However, the interesting question 
is: how can we simultaneously make the rates as large as possible? The points 
on the Pareto boundary, which is the curved portion of the outer boundary, 
are of particular interest because these points are such that the rate cannot 
be improved for any user without deteriorating the rate for at least one other 
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Figure 6.3: Example of a rate region (shaded) for K = 2 users containing all the rate points 
(R1, R2) that are simultaneously achievable in a multi-user system. The points on the Pareto 
boundary are the ones of practical interest. The points that give maximum sum capacity and 
max-min fairness (i.e., the minimum rate among the users is maximized) are indicated. 


user. This stands in contrast to the points in the interior of the rate region, 
where we can improve the rates for some users without reducing the rates of 
other users. The Pareto boundary is formally defined as follows. 


Definition 6.1. The Pareto boundary OR of the k-dimensional rate region 
R consists of all points (Ri,...,R«) € R for which there does not exist any 
CHE T RANERI RE A RRI witli R = Rr tonk — PETK. 


Since there are K user rates, but we can only operate the system in one 
way, there is no objectively optimal way of operating a multi-user system. The 
Pareto boundary is the closest characterization of optimality that we can 
obtain because any point (R1, ..., Rg) € R that is not on the Pareto boundary 
is suboptimal in the sense that there exist other rate points (R1, ..., Rg) E€ OR 
that are better or at least as good for every user. However, there are generally 
infinitely many points on the Pareto boundary. Hence, when designing the 
system, we need to make a subjective tradeoff between the rates achieved by 
the different users. Each point on the Pareto boundary represents one Pareto 
optimal tradeoff between the K user rates, but they are mutually unordered. 

To address this issue stringently, the system designer can select a utility 
function u(Ri,...,R«) that takes any rate point (R1,..., Rx) and provides 
a real scalar number representing the preference of that point; a larger value 
represents larger preference. Since a larger rate should be desirable for the 
system, the function should be increasing or non-decreasing with respect to all 
the rates. The choice of function will impose a subjective ordering of the points 
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in the rate region, leading to that we can now identify an operating point as 
the optimum with respect to the selected utility function. The corresponding 
optimization problem can be stated as 


maximize u(Rı,..., Rg). (6.2) 
(Ri, RK)ER 
This is called a resource allocation problem since the goal is determining 
how the transmission resources (i.e., the time, frequency, power, and spatial 
resources) are allocated to the K users. Depending on the utility and scenario, 
there might be one or multiple solutions to (6.2). We will exemplify two ways 
of selecting the utility function based on very different design principles. 


6.2.1 Max-Min Fairness 


The first example focuses on delivering equal rates to all users and making 
that common rate value as large as possible. This utility function can be 
defined as 


u(Rı,..., Rg) = eo Rk, (6.3) 
where the preference value is the lowest rate among all the users. This is an 
increasing function of all the rates, but it is not strictly increasing because 
only improving the lowest rate will increase the function value. Hence, it 
can also be called a non-decreasing function of the individual users’ rates. 
Substituting this utility into (6.2) results in the max-min fairness problem 


Rg. (6.4) 


maximize mi 


TI 
(Ri,..,RK)ER kE{1,...,K} 


By solving this problem, we will identify a point (Ri,...,RK) E€ R that 
satisfies Ry = Rə = ... = Rx since the utility gives no incentive to assign a 
larger rate to any user than to the other ones.! This condition is the equation 
of a line that passes through the origin and has the slope +1 in all dimensions. 
When illustrated in two dimensions, as in Figure 6.4, this line has a 45° angle 
to both axes. Solving the optimization problem in (6.4) entails identifying 
the point on this line that provides the largest rate values (i.e., is furthest 
from the origin) but belongs to the rate region. Hence, the optimum is the 
intersection point between the line and the Pareto boundary, as illustrated in 
Figure 6.4. The optimal rate is denoted by Rime in the figure and satisfies 
Rı = Ro = Roms and (Rum, Rmmt) € OR. Hence, it is relatively easy to 
identify the optimum by searching along the given line. The practical challenge 
is that it can be computationally complex to compute the region and to find 


1 There exist special cases where there are multiple solutions to the max-min fairness problem. 
All points provide the same max-min rate value, but some points provide higher rates for a 
subset of the users. This happens when the Pareto boundary only constitutes a subset of the 
outer boundary because some segments of the outer boundary are parallel to some of the axes. 
An example of this is shown later in Figure 6.8. 
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Figure 6.4: Example of a rate region for K = 2 users containing all the rate points (R1, R2) 
that are simultaneously achievable. If the max-min utility optimization in (6.4) is used to find 
the preferred operating point, the optimum (marked by a red star) is the intersection between 
the Pareto boundary and a line through the origin with a slope of +1. At the optimum, the 
users will have equal rates, denoted by Rmmf- 


how the transmission resources should be allocated to achieve a certain point 
in the region. Several detailed examples of how to characterize rate regions 
will be provided later in this chapter. 

The max-min utility results in the egalitarian solution to resource alloca- 
tion, which builds on the principle that all users have equal rights, in this 
case referring to the right to equal communication performance. When the 
users have widely different channel qualities (i.e., widely different single-user 
capacities), users with weak channel gains will achieve a larger fraction of their 
single-user capacities than users with strong channel gains. This principle can 
be observed in Figure 6.4, where user 1 has a stronger channel than user 2 but 
gets the same rate Rumer. One can argue whether that is a fair decision, but 
it reinforces the point that resource allocation decisions are always subjective. 


6.2.2 Maximum Sum Rate 


The max-min fairness problem focuses on achieving short-term fairness by 
allocating the transmission resources to give the users equal rates at every 
time instance, for the current set of active users and their current channel 
conditions. This can lead to the undesired side-effect that adding a single user 
with weak channel conditions to the system will throttle the performance of 
all other users. An alternative approach is to assume that the users will move 
around in the same coverage area over time and thereby switch between being 
the one with a strong channel gain and the one with a weak channel gain. 
To achieve long-term fairness, it is preferable to transmit as many bits per 
second as possible at every time instance, irrespective of how the sum rate is 
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currently divided between the users. On average, as the users move around in 
the cell, they will be allocated an equal share of the long-term average rate. 

Based on this logic, we should maximize the sum of the users’ rates, 
represented by the utility function 


K 
u(Ri,...,Rx) = >> Re. (6.5) 
k=1 


This is a strictly increasing function of all the user rates. Substituting this 
utility into (6.2) results in the sum-rate maximization problem 


K 

Pee 2, Re (6.6) 
By solving this problem, we will identify a point (Ri,...,RK) E€ R that 
satisfies Ri + Ro+...+ Rg = Rs for the largest possible sum-rate value Rg,. 
Using linear algebra terminology, this condition is the equation of a hyperplane 
of dimension K — 1. When illustrated for K = 2 users, as in Figure 6.5, it 
becomes the equation Rə = Rs, — R, of a line with the slope —1. It intersects 
the two axes at (Rsr,0) and (0, Rsr), and the line has an angle 45° to both 
axes. From the geometric perspective, the challenge in sum-rate maximization 
is to find the value Rsr so that the line touches the Pareto boundary without 
entering the region’s interior. The intersection point(s) are the sum-rate 
maximizing solution(s) to the resource allocation problem. Finding such a 
point is relatively easy in two dimensions, but as K increases, it entails 
moving around a (K — 1)-dimensional hyperplane to find when it intersects a 
k-dimensional rate region; this can be as computationally complicated as it 
sounds. Hence, sum-rate maximization is generally a computationally complex 
problem to solve, but there exists a wealth of algorithms [84], [85]. 

The sum-rate utility results in the utilitarian solution to resource allocation, 
which builds on the principle that an efficient system produces as much value 
(or goods) as possible using the given resources, in this case, measured in 
bits transferred per second. The allocation of the value between the users is 
not part of the utility function. Hence, when the users have widely different 
channel qualities (i.e., widely different single-user capacities), users with strong 
channels will achieve larger rates than users with weak channels. This principle 
is illustrated in Figure 6.5, where user 1 has a stronger channel gain than 
user 2. As noted earlier, the short-term differences in rates will average out if 
the users move around the same coverage area in the same way.” 


In practice, the users of a communication system will likely move around in the coverage 
area according to different distributions and spend a large fraction of time in their respective 
homes and workplaces. Moreover, different data services might be of importance at different 
locations. In summary, selecting an appropriate utility function can be very challenging. One 
possible solution is to consider the weighted sum-rate u(Ri,...,RK) = De wk Rpg, where the 
weights wą > 0 are tuned depending on the users’ locations, requested data service, and recent 
rates to maximize the users’ perceived quality-of-service [86], [87]. 
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Figure 6.5: Example of a rate region for K = 2 users containing all the rate points (Ri, R2) 
that are simultaneously achievable. If the sum-rate optimization in (6.6) is used to find the 
preferred operating point, the optimum (marked by a red square) is the intersection between 
the Pareto boundary and a line with a slope of —1 slope. All the points on this line provide 
Rı + Ro = Rey but only one point (R1 ,sr, R2,sr) € OR can be achieved by the system. 
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6.3 Uplink Communications 


We will now consider different ways to operate the uplink of a multi-user 
system in terms of different resource allocation solutions and the number of 
antennas at the base station. Since many different rate expressions will be 
presented and compared, we define the function 


C(x) = Blog, (1 + 2). (6.7) 


This is the capacity of a discrete memoryless channel with bandwidth B and 
SNR zx, and we recall from Chapter 3 that the capacity has this form in SISO, 
SIMO, and MISO scenarios. We will use this concise notation to explore how 
different uplink solutions provide different rate values C(x) that differ in the 
effective SNR x that is attained. We will compare three types of operation: 
orthogonal and non-orthogonal multiple access, and multi-user MIMO. 


6.3.1 Orthogonal Multiple Access 


We begin by considering the classical setup where a single-antenna base 
station receives signals from K single-antenna user devices that share a 
communication channel with bandwidth B Hz. The channel gain of user k is 
denoted by 8p € [0,1], for k =1,..., K. It then follows from (2.146) that the 
single-user capacity of user k is 


a PBN PB, 
Cy = (5) = Blogs (1+ E) bit/s, (6.8) 


where P is the maximum transmit power of the user. The transmission 
resources involved in this multi-user system are time, frequency, and power. 
Each user has a separate power amplifier and maximum transmit power 
P; thus, the resources that can be divided between the users are time and 
frequency. In this section, we consider orthogonal multiple access (OMA), 
where the users are assigned orthogonal time-frequency resources. 

We begin by considering frequency-division multiple access (FDMA), which 
is an OMA scheme where the users are assigned non-overlapping fractions of 
the bandwidth B. We let £% € [0, 1] denote the bandwidth fraction allocated to 
user k, for k = 1,..., K. These fractions can be selected arbitrarily under the 
constraint €; + 2 +... +x < 1 so that each bandwidth portion is assigned 
to at most one user. All users transmit continuously over their assigned bands; 
thus, user k experiences a point-to-point system with bandwidth £B. By 
replacing B with ¿xB in (6.8), the data rate of user k becomes 


PB, PB 
Er BNo £,BNo 


Ret) = &C ( ) = Blog, (1 4 ) bit/s, (6.9) 


where the notation Rj (&,) emphasizes that the rate is a function of the fraction 
of the total bandwidth assigned to the user. It can be shown (by computing the 
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first-order derivative) that R(x) is an increasing function of €,, which agrees 
with the intuition that using as much bandwidth as possible is preferable. It is 
crucial to nue that only the pre-log factor in (6.9) increases with €p, while 
the SN . This highlights that we can increase the 
SNR in u e by concentrating the transmit power in a narrower 
frequency band, where there is less noise. 

Based on the rate expression in (6.9), we can define the rate region as 


R = {(Ri(&),...,Rx(Ex)) : for &,...,€h 20,6 +... +êx <1}, (6.10) 


which is the continuous set of all points (Ri(&1),...,R«(€x)) that can be 
obtained by dividing the bandwidth between the users in different ways. While 
the rate of user k in (6.9) is an increasing function of the fraction €, assigned 
to this user, it is independent of how the remaining bandwidth fraction 1 — & 
is assigned to the other users. Hence, only the constraint é&1 +2 +...+éx <1 
creates a tradeoff between the users’ rates. Whenever there is equality in this 
constraint, we will obtain a Pareto optimal point because then the only way to 
increase the rate of user k is to reallocate bandwidth from another user to user 
k (e.g., increasing x and simultaneously decreasing €; by an equal amount, 
for some i Æ k). The Pareto boundary is thus characterized by replacing the 
inequality with equality in (6.10): 


OR = {(Ri(fi),..., Re(Ex)) : for &1,...,€e 20,8+-..+€e = 1}. 
(6.11) 
The rate region is exemplified in Figure 6.6 for a setup with K = 2 users 
and B = 10MHz. In Figure 6.6(a), the two users have equal channel quality, 
represented by 7 = L= De = 10. The rate region is then symmetric, which im- 
plies that a fairness is achieved at the sum-rate maximizing operating 
point. In this example, that operating point is (R1, R2) = (22.0, 22.0) Mbit/s. 
The tradeoff created by dividing the bandwidth between the two users re- 
sults in a curved Pareto boundary. In Figure 6.6(b), the users have different 
channel qualities, represented by 5 È D + = 10 and È? Ba = 5. The rate region 
remains curved but is now aa Ue: Ma. E T is achieved at 
(19.1,19.1) Mbit/s, while the sum rate is maximized at (26.7, 13.3) Mbit/s. 
The maximum sum rate is 40 Mbit/s, while the max-min fairness achieves the 
sum rate of 38.2 Mbit/s. The regions were generated numerically using (6.11). 


Example 6.1. How does FDMA behave when the bandwidth is very large? 
When B —> oo, the rate in (6.9) can be approximated using (3.2) as 


PB Pk 


Re (Ex) © Ex B logs (e) 


This expression is independent of the fractions &1,...,€, which shows that 
bandwidth allocation is easy when spectrum is abundant. 
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(b) Two users with different channel qualities. 


Figure 6.6: Examples of the uplink rate regions for K = 2 users when using orthogonal multiple 
access based on FDMA. 
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For a given finite amount of bandwidth, one can prove (by differentiation) 
that the maximum sum rate is achieved by selecting the weights proportionally 
to the channel gains: 

Br 
pe b i 


By substituting this value into (6.9), the rate achieved by user k is 


a (et z) 7 D ar (>. a . bit/s (6.14) 


The bandwidth is allocated so that all users obtain the same SNR value to 
prevent the system from operating too far into the logarithmic regime of the 
rate function. However, users with strong channels obtain a larger rate thanks 
to a larger bandwidth fraction. The sum rate becomes 


yk Gan )- C (>: a | bit/s. (6.15) 


Interestingly, this is the same rate as one would get over a point-to-point 
MISO channel with K antennas, the channel vector h = [\/8,...,/8x]", 
and a total transmit power of P. However, in the FDMA scenario, each user 
transmits from a single antenna with power P, so the total transmit power is 
KP. The reason for the increased power in the multi-user system, compared 
to the MISO system, is that the users transmit different signals in orthogonal 
frequency bands; thus, there is no beamforming gain. 

FDMA is not the only OMA scheme. Another option is time-division 
multiple access (TDMA), where the users take turns transmitting over the 
entire bandwidth. Suppose user k is active for a fraction of time denoted by 
& € [0,1], for k = 1,..., K, which has been selected so that €;+£2+...+&K < 
1. The user will then achieve a fraction &;, of its single-user capacity, represented 


by the rate 
PB 
BN) ` 


(6.16) 


EO = EC ( 


Interestingly, this rate is strictly smaller than the rate €,C( aa Pr =) in (6.9) 
that is achieved by FDMA, if the fraction & assigned to the user is the same 
(equality is achieved for £4 = 1 when only one user is served). It might seem 
counterintuitive that FDMA outperforms TDMA since each user is assigned 
the same fraction of the total time-frequency resources in both cases. The 
reason behind this result is that the power amplifier is turned on and off in 
TDMA; thus, even if the instantaneous transmit power is P when the user 
is transmitting, the time-average transmit power is reduced to éP. This 
explains why the SNR is & times smaller when using TDMA than FDMA. 
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Example 6.2. Can the time resources be divided orthogonally between the 
users without turning off the power amplifiers? 

Yes, this can be achieved by letting the users repeat their data symbols in 
a way that allows the receiver to separate the transmissions. For example, in 
a setup with K = 2 users and ¿1 = 2 = 1/2, the users can be assigned the 
orthogonal vectors [1, 1]* and [1,—1]" where all entries have unit magnitude. 
The users multiply each data symbol with their respective vectors and transmit 
the result as two consecutive symbols in time. The base station can undo the 
operation by multiplying its received signal over the two consecutive symbol 
times with the respective vectors. The SNR is improved by a factor 1/&, = 2 
(compared to TDMA) by the repeated transmission, while the orthogonality 
ensures that there is no interference between the users. Since the multiplication 
with the vectors spreads out each data symbol over time, the vectors are called 
spreading sequences. This example represents a third type of OMA scheme 
and is a special case of the general concept of code-division multiple access 
(CDMA). While CDMA is a remedy to the SNR issue that TDMA suffers 
from, it will not outperform FDMA and limits which values of €, can be 
selected to match with spreading sequences. Hence, based on its performance 
and flexibility, FDMA remains the preferred option among the OMA schemes. 
We refer to [26] for further details on CDMA and its extensions. 


6.3.2 Non-Orthogonal Multiple Access 


The previous section demonstrated how a base station can serve multiple 
users by dividing the time-frequency resources between them in an orthogonal 
manner; for example, each portion of the frequency band can be assigned to one 
user. The reason for the orthogonal resource division is to avoid interference. 
Still, such a protective system design might not be optimal for maximizing our 
utility function (e.g., max-min fairness or maximum sum rate). An alternative 
solution is non-orthogonal multiple access (NOMA), where the K users share 
the same time-frequency resources, and the interference is instead managed by 
signal processing. In this section, we will show that the rate region obtained 
by NOMA is larger than the region achieved by FDMA (and other OMA 
schemes). In fact, it is the largest rate region that can be obtained in the 
considered setup, called the capacity region. 

For brevity, we will describe the concept of NOMA in the case of K = 2 
users that share a bandwidth of BHz. Each user has a maximum power P 
and transmits with some power Pt! € [0, P], for k = 1,2. We consider a 
discrete memoryless multiple access channel where the two users transmit 
simultaneously, as illustrated in Figure 6.7. The received signal is 


where x;|/] is the input signal from user k at the discrete time l and the energy 
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nll] 


Figure 6.7: A discrete memoryless multiple access channel with K = 2 users. The two input 
signals are xı|l] and x[J], where l is a discrete-time index. The output is y[l] = hı - x1 [l] + hə - 
xl] + nll], where h1, hg are the channel responses and n|] is the independent complex Gaussian 
receiver noise. 


per symbol is P'/B (because there are B symbols per second). The complex 
channel response from user k is denoted by hy and assumed deterministic, so 
it might be a LOS channel. The magnitude square of the channel is denoted 
as Br = |hx|?. Moreover, nfl] ~ Nc(0, No) is the independent receiver noise. 
There are two input signals x,[J],x2[l] but only one output signal y[l]. It is 
nevertheless possible for the receiver to extract data from both signals if 
the data is appropriately encoded. Note that (6.17) is an extension of the 
discrete memoryless channel in (2.130) to the case where two users transmit 
simultaneously and therefore interfere with each other. 

Suppose the input signals are Gaussian distributed: x;[I] ~ Nc(0, PH! /B) 
for k = 1,2. This is the optimal input distribution in the point-to-point case 
and can be proved to be optimal also for multiple access channels. We refer to 
[26, Appendix B.9] for details. If the receiver focuses on user 1, the received 
signal in (6.17) can be rewritten as 


yll] = hizi[l] + n4 [l], (6.18) 


where n) [l] = h2xre[l]+n[l] ~ Nc(0, P} 82/B + No) is an independent complex 
Gaussian distributed variable. It is not conventional noise since it consists of 
both an interfering signal and receiver noise. However, from the perspective 
of decoding the signal from user 1, it takes the role of an effective noise term 
distributed in the same way as receiver noise (apart from the larger variance). 
Hence, it follows from Corollary 2.1 that an achievable rate of user 1 is 


fi | =). ae (6.19) 
P% 62 + BNo 


eye PB 
where we utilized the fact that 61 = |h1|? and b2 = |h2|?. The term Piast BNo 


is the SINR, but here it takes the role as an effective SNR; that is, user 1 
achieves the same rate as in a point-to-point SISO channel with the SNR 


value . This also means that the data can be encoded and decoded 


Př bı 
P3"B2+BNo 
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identically, which aligns with our previous assumption of having Gaussian dis- 
tributed data symbols, which achieve the capacity of point-to-point channels. 

Once the receiver has decoded the signal sequence {x;[I]} from user 1, it 
can subtract it from the original received signal in (6.17) as 


The result looks like the received signal of a conventional SISO channel with 
the additive receiver noise n|l] but no interference. Hence, the achievable rate 
is 


R, =C (Fe) bit/s. (6.21) 


Interestingly, user 2 achieves the same rate as if it had been assigned the entire 
bandwidth in OMA. This is enabled by the procedure of first decoding the 
signal from user 1 and then subtracting it from the received signal. We followed 
a similar procedure in Section 3.4.3 to sequentially decode the transmitted 
streams over a point-to-point MIMO channel, in which case we called it 
successive interference cancellation (SIC). We will use the same terminology 
here and recall that it is a non-linear receiver processing scheme because we 
must decode one signal sequence entirely before subtracting interference. 
It follows from (6.19) and (6.21) that the sum rate is 


Pu Pu 
Ra + Ra = Blogs (1 rA ) -Biog (14 ze) 


PB + BNo BNo 


PB, + PB. + BNo PY B> + BNo 
= Bl 1 2 + Bl eat Sei 
= ( Pyp + BNo 982 | BNo 


P bı + P} B2 -c Pre, + PBa (6.22) 
BNo BNo j ` 


= Blog, (: 


We can notice from (6.22) that the sum rate is an increasing function of both 
Pr! and Ps", thus it is maximized when both users transmit at their maximum 
power P. This implies that any tuple of achievable rates (R1, R2) must satisfy 


PB + =) l 


6.23 
BN, (6.23) 


Ry + Re <C( 


The sum-rate expression is symmetric with respect to the two users; thus, the 
same sum rate can be achieved if the receiver first decodes the signal from 
user 2, subtracts that signal from the originally received signal, and finally 
decodes the signal from user 1. However, in the latter case, it is user 1 that 
achieves the same rate as if assigned the entire bandwidth in OMA mode. In 
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Figure 6.8: When using NOMA, the uplink rate region with K = 2 users is characterized by 
the three line segments shown in the figure. The Pareto boundary is the diagonal line segment 
between the operating points in (6.24) and (6.25), marked with filled circles, which are achieved 
by decoding the signals from one user first and subtracting its interference before decoding the 
signals from the other user. 


summary, we know that we can achieve the points 


mon (e(o) CE) 2 
mono (e(2) e(r) em 


These points are marked with filled circles in Figure 6.8, and it is indicated 
that the first point is achieved by decoding the signal from user 1 first, while 
the second point is achieved by decoding the signal from user 2 first. By 
switching between operating at these different points over time, a procedure 
called time-sharing, we can achieve any point on the dashed line segment 
drawn between the two points. This is the Pareto boundary of the rate region, 
and it is a segment of the line defined by the maximum sum rate equation 
Ry + Ry = C( PAPA), 

It is also possible to achieve any point for which the entries are strictly 
smaller than the points on the Pareto boundary; that is, any point between 
the axes and the dashed line segments in the figure. The vertical segment is a 


portion of the line defined by Ry = C( ra Re ), while the horizontal segment is 


a portion of the line defined by Rz = C( a ). Hence, the rate region is the 
pentagon determined by these three lines and can be characterized as 


PH, PBe 
R TEA o<mec(eh), osm se| E). 
PB, + Ph2 
< C i : 
Ry + Re <C( ox )} (6.26) 
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Any point that satisfies the three equations in (6.26) belongs to the rate 
region. We have demonstrated the achievability of this rate region by SIC and 
time-sharing. It can also be proved that no other NOMA scheme can achieve 
a larger rate region; we refer to [42, Ch. 14] for details. The Pareto boundary 
is obtained when there is equality in the third equation of (6.26) so that the 
maximum sum rate is achieved: 


aR = { (Ra, Ra): o<mso(F), o<mso(F), 
No 0 
Ry + Re = 0 (AAA) \ (6.27) 
0 


Example 6.3. Compare the sum rate in (6.23) with that of a point-to-point 
MISO channel with the channel vector h = [V/61, V/B2]". 

The MISO channel capacity is given in (3.47) and by substituting q = P/B 
into the expression, we obtain 


I i (Ee 
BN BNo 


Blogs (1 ) bit/s. (6.28) 


0 
This is the same as the sum rate in (6.23), but the latter is achieved using 
a total transmit power of 2P instead of P because there are two users. 
The NOMA setup is instead mathematically identical to a MISO system 
that uses suboptimal precoding where each antenna transmits different data. 
We first came across SIC in Section 3.4.3 when analyzing how to decode 
the received data with arbitrary precoding. Using the notation from that 


section, the precoding matrix is P = Ig and the power allocation matrix is 
Q = diag(P/B, P/B). 


To compare the rate regions attained by the orthogonal and non-orthogonal 
types of multiple access, we will continue the example from Figure 6.6(b). 
Recall that the two users have different channel qualities: a2 No = 10 and 


ae = = 5. Figure 6.9 shows the rate regions obtained with OMA/FDMA and 
NOMA. We notice that NOMA achieves a larger rate region, containing all the 
operating points OMA achieves and some additional points along the Pareto 
boundary. When using NOMA, all the points on the Pareto boundary maximize 
the sum rate and represent different ways of allocating the sum rate between 
the users. One point on the Pareto boundary is also optimal in the max-min 
fairness sense; thus, we can maximize both utility functions simultaneously 
when using NOMA. Interestingly, the maximum sum rate is 40 Mbit/s for 
both FDMA and NOMA. While this specific value depends on the simulation 
assumptions, the equivalence is not unique to this example but can be noticed 
by comparing the maximum sum rate expression for FDMA in (6.15) with 
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Figure 6.9: Example of uplink rate regions for K = 2 users with different channel qualities 


when using either NOMA or OMA based on FDMA. This is a continuation of the example from 
Figure 6.6(b). 


the corresponding expression for NOMA in (6.23), which are identical. There 
is, however, a rate difference when it comes to max-min fairness, which is 
achieved at (20,20) Mbit/s with NOMA and at (19.1, 19.1) Mbit/s when using 
orthogonal access. Hence, if the system is designed for max-min fairness, both 
users can achieve a 5% higher rate when using NOMA. 


This example indicates the benefit that NOMA provides over OMA: the 
maximum sum rate value is the same but can be allocated between the users in 
a variety of different ways, while FDMA only achieves it using one specific rate 
division. For example, NOMA allows for the max-min fairness and sum-rate 
utilities to be maximized simultaneously; however, there is generally a tradeoff 
between these performance targets when considering OMA. If the users have 
widely different channel gains, the max-min fairness point with NOMA might 
be at a corner point of the Pareto boundary. At this point, the user with the 
weakest channel gain achieves its single-user capacity by being decoded last, 
while other users might achieve higher rates than that. 

The rate region with NOMA for K > 2 users can be formulated and 
achieved similarly to what was described earlier in this section. Recall that 
three equations characterize the rate region in (6.26) in the two-user case: 
each user’s rate must be lower than or equal to the respective single-user 
capacity, and the sum rate is upper bounded by the capacity of a point-to- 
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point MISO channel with the same transmit power P. When extending this to 
the K-user case, there will be 2% — 1 equations, each describing how the sum 
rate of a certain subset of the users is upper bounded by the point-to-point 
MISO channel capacity with a channel vector containing the users’ channel 
coefficients.” More precisely, the rate region can be characterized as follows 
(where the time index l has been omitted for brevity) [42, Sec. 14.3.5]. 


Theorem 6.1. Consider a K-user discrete memoryless multiple access channel 
with the inputs z1,..., £g E C and the output y € C given by 


K 
y= >_hete +n, (6.29) 

k=l 
where n ~ Nc(0, No) is independent noise and h1,..., hg € C are constant 


channel coefficients known at the output. Suppose the input distributions 
are feasible whenever E{|x;|?} < P/B, where P is the transmit power and 
B is the bandwidth (and symbol rate). If Ry denotes the rate of user k and 
Br = |hx|?, the capacity region is given by 


R= |(Riy-+-) Rx) 0 5 a a rat | for all K C {1,...,K}}. 


kek kEK 


(6.30) 


Notice that K in (6.30) denotes a subset of the indices of the K users, 
and there are 2% — 1 different non-empty subsets to consider. For example, if 
K = 3, the seven subsets are {1}, {2}, {3}, {1,2}, {2,3}, {1,3}, and {1, 2, 3}. 

Figure 6.10 exemplifies the rate region achieved with NOMA for K = 3 
users, with $81 = 10, #2 = 5, eye = 2.5, and B = 10 MHz. The Pareto 
boundary is the area enclosed by the solid line segments. The six corner points 
are achieved by letting the users transmit at maximum power and then decode 
their signals sequentially in different orders using SIC. Other points on the 
Pareto boundary can be achieved by time-sharing between operating at the 
different corner points. All the points on the Pareto boundary achieve the 
same sum rate 


K 
Bit Re = 0 (SFE), (6.31) 


i=l 


which is also the same as the maximum sum rate in (6.15) achieved by FDMA. 
As stated earlier, the key difference is that NOMA can divide the sum rate 
between the users in multiple ways, while FDMA can not. 


3K of these subsets correspond to the single-user capacity bounds since each of those subsets 
includes a single user. The number of subsets that include k users is eae When we add all the 


subsets for k = 1,..., K, it follows from the binomial theorem that there are 2“ — 1 subsets. 
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Figure 6.10: Example of an uplink rate region for K = 3 users when using NOMA. 


6.3.3 Uplink Multi-User MIMO with Non-Linear Processing 


The underlying reason that NOMA cannot increase the sum rate compared 
with OMA is that the base station is only equipped with a single antenna; thus, 
it can only distinguish one signal dimension per received symbol. Different 
access schemes can allocate different fractions of this dimension to different 
users but not create additional signal dimensions. It is instructive to compare 
multiple access schemes with the operation of point-to-point channels. As 
mentioned in Example 6.3, the multiple access system model is mathematically 
indistinguishable from a MISO channel where the K transmit antennas are 
sending different messages (instead of using MRT) and the receiver has M = 1 
antenna. The achievable rate for such a setup is given by (3.106) and can be 
shown to coincide with the sum rate achieved by FDMA and NOMA. Since 
the base station is only equipped with a single antenna, the multiplexing gain 
is min(M, K) = M = 1, which is another way to quantify that the users share 
one signal dimension. However, this analogy reveals a potential solution to 
the dimensionality bottleneck: if the base station would be equipped with 
M antennas, for some M > K, the maximum multiplexing gain becomes 
min(M, K) = K. In that case, the sum rate can possibly be improved by 
serving multiple users simultaneously over the entire bandwidth—the more 
users the better, as long as K < M. This is the essence of multi-user MIMO. 
Note that the MIMO terminology is utilized even when each user device only 
has a single antenna because the multiple inputs are the multiple transmitting 
users, and the multiple outputs are the multiple receive antennas. 
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Figure 6.11: A discrete memoryless uplink multi-user MIMO channel with K = 2 single- 
antenna users and M receive antennas. The two input signals are x,[l] and x[l], where l is 
a discrete-time index. The output is y[!] = hy - xı [l] + hə - x2[l] + n[l], where hj, he are the 
channel vectors and n{[l] is the independent complex Gaussian receiver noise. 


We begin by considering a discrete memoryless channel with K = 2 single- 
antenna user devices and a receiving base station equipped with M > 2 
antennas. Both users transmit simultaneously over a bandwidth of B Hz and 
their transmit powers are Pe € [0, P], for k = 1,2, where P is the maximum 
power. The received signal y[l] € C™ at the discrete time l is 


where 2,[/] is the input signal from user k, for k = 1,2. The energy per 
symbol is P™/B and we assume Gaussian codebooks, such that s[i] ~ 
Nc(0, Pt'/B). The channel vector from user k is denoted by hy € C™, while 
n{!] ~ Nc(0, NoIm) is independent receiver noise. A block diagram of this 
uplink multi-user MIMO channel is provided in Figure 6.11. 

We will characterize the rate region by following the same non-linear 
receiver processing as in the case of NOMA, namely SIC. If the receiver 
focuses on user 1, the received signal in (6.32) can be rewritten as 


yll] = bızı [l] + o4 [l], (6.33) 


where ni [l] = həx2[l] + a[l] ~ Nc(0, Fe ho 3 + NoIm) is an independent 
complex Gaussian distributed variable. This effective noise term contains 
both an interfering signal and receiver noise. Since the covariance matrix 
Fe hyh¥ + Nol m has non-zero off-diagonal entries, the effective noise is colored. 
As described in Section 2.2.4, colored noise can be whitened by multiplying 
with the inverse square root of the covariance matrix of the noise: 


pul —1/2 pu —1/2 
(Enns + Non) yli] = (nns + roy) hiv (!] + ny [I], 
(6.34) 


where the new effective noise n1 [l] = (2 hohë +Nolm) tni [I] ~ Nc(0, Im) 
is spatially white. We notice that (6.34) is the system model of a SIMO channel 
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of the type considered in Corollary 3.1, but the physical channel vector h; to 


ul 
user 1 is replaced by the effective channel vector (2 hoh§ + NoIm)~/7hi. 
Hence, the achievable rate of user 1 is 


pu Pu =1/2 i 
Rı = Blog, | 1 + = ( 2 hh) 4 Nob] hı 


B|\ B 
ul, H ul H = 
= Blog, (1 + Pi"hY (Ps"hoh¥ + BNoIm ) hi) 
=i 
=C Gi (P3"hoh} n BNolu) hi) bit /s. (6.35) 


This rate is achieved by applying an MRC vector based on the effective channel 
to the whitened received signal in (6.34). Instead of carrying out the whitening 
and MRC as two separate steps, the combining vector 


pul —1/2 pul —1/2 
w= (Enns + Not) (Enns + vo) hi 
—1 
Saree 
= -p hehy + Nolm hı (6.36) 


can be applied to the original received signal in (6.33). Since the receiver 

computes the inner product wły[l], receive combining is a linear processing 

scheme. We call this LMMSE combining since it can be shown similar to 
ul 

Example 3.4 that ĉ, [l] = Pagen ra WEY] is the LMMSE estimate of x [l]. 


Once the receiver has decoded the signal sequence {x;[I]} from user 1, it 
can subtract it from the original received signal in (6.32) as 


yl] — hz: [i] = hzz2[l] + nll). (6.37) 


This resembles the received signal of a conventional SIMO channel with white 
receiver noise; thus, the achievable rate is equal to the single-user capacity of 
user 2: 


_ Py 2 ' 
Ro =C (a ||h2|]| ) bit/s. (6.38) 


This rate is achieved by applying an MRC vector w2 = hə/||hə2|| to the received 
signal in (6.37) after the interference cancellation. Notice that the receiver 
processing related to user 2 is non-linear since we are not only computing an 
inner product between the received signal and a combining vector but also 
subtracting interference caused by the decoded signal from user 1. 

The sum rate can be computed by adding (6.35) to (6.38), but some 
lengthy matrix algebra of the kind in Section 3.4.3 is required to simplify 
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the expression. However, we can take a shortcut by interpreting (6.32) as a 
point-to-point MIMO channel by writing the received signal as 


yl] = [hi he] b, al +n{J], (6.39) 
=H =P —“~—’ 


ey 


where P = I; is the precoding matrix and the signal vector x[I] ~ Nc(0, Q) 
has the covariance matrix Q = diag(P?!/B, P}'/B). The diagonal precoding 
matrix indicates that the two users transmit independently encoded signals, 
as required in a multiple access channel. It then follows from (3.106) that the 
sum rate is 


Pu ' Pel z 
Rı + Rə = Blogs (a (1 t BN, hyh; } BN, nan) ) á (6.40) 


The expression in (6.40) is an increasing function of both Pt! and P%", thus 
it is maximized when both users transmit at their maximum power P. This 
implies that any tuple of achievable rates (R1, R2) must satisfy 


P H P H 
Ri + Rə < Blogs (aet (tu t BN t zrehi) ) ; (6.41) 


The sum-rate expression is symmetric with respect to the two users, 
indicating that there are multiple ways of achieving it. The procedure of 
decoding the signal from user 1 first and removing its interference before 
decoding the signal from user 2 is only one of these ways. By exchanging the 
roles of the two users, another operating point can be achieved. These points 
are marked with filled circles in Figure 6.12 and given by 


= P 
(R, R) =(C (Phi (Phoh® + BNoIy)7! hi) „C ( —|hpl|?) ), (6.42) 
BNo 
P = 
(Ri, Ro) = (Cc ( ——|h|/?),C (Phy (Phib? + BNoIy)~? h2) . (6.43) 
BNo 
The line segment between these points is the Pareto boundary. The pentagon 
structure of the rate region is clearly the same as with NOMA, but the rate 


points are computed differently; in fact, NOMA is the special case of multi-user 
MIMO obtained with M = 1. The complete rate region can be defined as 


P P 
R= f(r., Ra) 0<R, <C (zyel?) S E, (zy P) 


P H P H 
Rı + Ro << Blogs (aet (tu + pm + zathi) ) \ 
(6.44) 
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Figure 6.12: The uplink rate region with K = 2 users is characterized by three line segments 
when using multi-user MIMO. The Pareto boundary is the diagonal line segment between the 
operating points in (6.42) and (6.43), marked with filled circles, which are achieved by decoding 
the signals from one user first and then subtracting its interference before decoding the signals 
from the other user. 


Any point that satisfies the three equations in (6.44) belongs to the rate region. 
The Pareto boundary is obtained when there is equality in the third equation 
so that the maximum sum rate is achieved: 


OR = Í (Ri, Ra): o< R <o TE o< msc (a -\Iho*) 


P H P H 
Rı + Ro = B logs (aet (tu l BN, hy, hy 4 zri ))|. 
(6.45) 


The rate region and the Pareto boundary are illustrated in Figure 6.12. 
These results can be extended to the general case K > 2 by following the 
same approach as in the NOMA case. The critical point to notice is that 
the equations defining the rate region are considering each non-empty subset 
of the K users and specifying that their sum rate should be lower than or 
equal to the corresponding rate achieved by point-to-point MIMO where the 
considered users transmit independent signals at their maximum power P. 
Even the single-user rates have this structure, which can be noticed from that 


P 
c (ir lbs?) = Blog, (1+ Ih) 


P H 
= Blogs (aet (11 + BNo E) . (6.46) 


We obtain the following general result regarding the capacity region of uplink 
multi-user MIMO. 
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Theorem 6.2. Consider a K-user discrete memoryless uplink multi-user 
MIMO channel with the inputs x1,..., £g € C and the output y € C™ given 
by 


K 
y=) bya, +n, (6.47) 

k=1 
where n ~ Nc(0, NoIm) is independent noise and h;,..., hpg € C™ are 


constant channel vectors known at the output. Suppose the input distributions 
are feasible whenever E < 
is the bandwidth (and symbol rate). If Ry denotes the rate achieved by user 
k, then the capacity region is given by 


R= |(Riy--s Ba) :0< XO Re < Blog, [ae (10+ ge mnt) ) 


kek kek 


for all ore tea) (6.48) 


By considering (6.48) with K = {1,..., K}, we obtain 


K K P 
XO Re < Blog [ce (tu + 2 BN, sachet) ) 


k=1 
P Š 


where the upper bound is the sum capacity. The last expression is obtained 
using the notation H = [h;,..., hgx] and is identical to the rate expression 
in (3.106) for a point-to-point MIMO system if each antenna transmits a 
different signal using the precoding matrix P = Ig and Q = diag( 5, psen 5). 
This means that, from a total bit rate perspective, uplink multi-user MIMO 
is like a point-to-point MIMO system where the transmit antennas are not 
collaborating. However, the channel modeling will be very different. 

To demonstrate how the use of multiple base station antennas affects the 
shape of the rate region, we will continue the example with K = 2 users from 


Figure 6.9. Recall that the users have different channel qualities: BR ~ = 10 


and pe = 5. Figure 6.13 shows the rate regions that multi-user MIMO 
achieves with M = 2, M = 4, and M = 8 antennas, as well as M = 1, 
which represents the previously considered NOMA setup. We assume the 
base station has a ULA with half-wavelength antenna spacing. We use the 
LOS channel model from (4.23) and let the users be located in two different 
azimuth angle directions: pı = —7/20 and p2 = 7/20 (i.e., there is a 18° 
angular spacing). As the number of antennas increases, the beamforming gain 
increases the single-user capacities, thus pushing the horizontal and vertical 
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Figure 6.13: Examples of uplink rate regions for K = 2 users with multi-user MIMO and a 


varying number of antennas M, where M = 1 corresponds to NOMA. This is a continuation of 
the example in Figure 6.9. 


lines toward larger numbers. Moreover, the increased multiplexing gain from 
1 to 2 makes it possible to deal with the inter-user interference so that the 
sum rate increases with a faster slope. This can be seen from the fact that the 
diagonal part of the region becomes shorter the more antennas are used. This 
eventually implies that both users can simultaneously achieve rates almost 
equal to their respective single-user capacities. In conclusion, this benefit from 
adding antennas (i.e., increasing M) continues even if the full multiplexing 
gain is achieved already at M = 2. 

To understand the reason for this result, we can take a closer look at the 
effective SNR in (6.42) that user 1 achieves when it is decoding its signal first: 


hsh} 


1 
Phi! (Phgh3 + BNoỌIlm) "hı = Ph? In hy 
BNo (Buo (3+ Ba) 
P P\h#h, | 
= ——|h,|/? (1 2 ), 6.50 
Bo (1 TaN + Piha?) een) 
———"’_—=>u 


Single-user SNR Reduction due to interference 


where the first equality follows from Lemma 2.3.4 The last expression reveals 
that the effective SNR consists of two factors. The first factor is BN: Ibi |I?, 


4The matrix inversion lemma is utilized with A = BNoIm, B = ho, C = P, and D = hf. 
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which is the SNR expression that appears in the single-user capacity expression. 
The second factor also depends on the channel hg of the interfering user and 
determines the performance reduction caused by inter-user interference. This 
factor takes a value between 0 and 1, representing what fraction of the single- 
user SNR is achieved under interference. This factor usually approaches 1 as 
we increase the number of antennas, thereby making the diagonal part of the 
rate region shorter and shorter, as observed in Figure 6.13. 


Example 6.4. Show that the second factor in (6.50) goes to 1 as the number 
of antennas M — oo in the setup considered in Figure 6.13. 

In the simulated scenario with a ULA and LOS channel conditions, we 
have hy = /8jay(—7/8) and hə = ybz2am(7/8) using the array response 
vector expression in (4.49). We can utilize (4.50) and (4.52) to rewrite the 
second factor in (6.50) as 


P\hgh,|? = PB Be sin? (Mr sin(Z)) 
|||? (BNo + P||h2l|2) Mßbı(BNo + PM 2) sin? (rsin(7)) 
1 1 
2i (6.51) 


= M? sin? (z sin(Z))’ 


where the lower bound is obtained by replacing sin? (M7 sin(Z)) with 1 and 
BNpo with 0, which are two operations that result in subtracting a larger term. 
The lower bound goes to 1 when M — oo. The mathematical explanation 
is that the directions hy /||hj|] and h2/||h2|| of the channel vectors become 
increasingly orthogonal as more antennas are added to the ULA, making it 
easier to suppress interference without sacrificing much of the desired signal. 
The physical explanation is that the beamwidth shrinks with M. 


Interference is the performance limiting factor when the SNR is high, while 
it drowns in the noise when the SNR is low. The last term in (6.50) can be 
upper bounded as 

P\byhy |? [has |? 
[b1 ||?(BNo + P|[h2]l?) ~ (las ||? [h2]? 


(6.52) 


where equality is achieved at high SNR where BN — oo. Propagation 
environments where this term vanishes when using many antennas are said to 
provide favorable propagation [88] because the effective SNR then approaches 
the single-user SNR. This property can be formalized as follows. 


Definition 6.2. The pair of channels h, hə € C™ is said to provide favorable 
propagation if 
[hy hy 


See) ae loc. (6.53) 
lha ||||2 | 
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This definition aims to evaluate if a given channel model has the desired 
property that the performance loss from interference reduces gradually as we 
add more antennas. However, taking the limit, M — on, is just a mathematical 
curiosity and should not be interpreted literally. In practice, we typically only 
need 10-100 antennas to make the impact of interference negligibly small 
for most kinds of channels. Moreover, most channel models considered in 
this book were derived under a far-field assumption, which will eventually 
be invalidated as the aperture length increases (i.e., the Fraunhofer distance 
grows with M). A precise analysis is more complicated but can be found in 
[89]. In summary, the favorable propagation property simply says: interference 
is easy to suppress if we have many antennas. 


6.3.4 Uplink Multi-User MIMO with Linear Processing 


The last two sections demonstrated how SIC could be utilized in NOMA and 
multi-user MIMO to achieve Pareto optimal operating points. Unfortunately, 
this non-linear processing scheme has some practical drawbacks. Firstly, the 
sequential decoding of the users’ signals leads to a decoding delay that grows 
proportionally to the number of users. Secondly, practical data packets have 
a finite length and, thus, a non-zero probability of decoding errors. When 
one user’s data is decoded incorrectly, the interference cancellation will fail, 
which implies that the users whose data are decoded later in the sequence get 
more interference rather than less. This most likely leads to further decoding 
errors, which is called error propagation. Thirdly, the individual users’ data 
rates must be selected jointly based on the decoding order. Hence, if one 
user experiences a sudden change in channel conditions, all user rates must 
be updated accordingly. If we omit the SIC step instead, Pareto optimality 
cannot be ensured, and most users will experience a rate reduction. However, 
the practical benefits are that the receiver can now decode all the users’ data 
simultaneously (e.g., using a multi-core processor), decoding errors for one 
user will not cause decoding errors for other users, and the rates can be 
selected independently for the different users based on only their individual 
SINR. In this section, we will analyze multi-user MIMO with such linear 
processing, where each user’s signal is essentially decoded as if it is the first 
to be decoded. In particular, we will investigate under what conditions the 
performance loss is slight when omitting the SIC procedure. 

We consider a discrete memoryless channel with K single-antenna user 
devices and a receiving base station equipped with M > 2 antennas. Setups 
with M > K are particularly important, but the performance analysis does not 
require that assumption. The users transmit simultaneously over a bandwidth 
of B Hz and their transmit powers are Pe € [0, P], for k = 1,..., K, where P 
is the maximum power. The received signal y[I] € C™ at the discrete time 1 is 


yl] = X` hea, [l] + nfl, (6.54) 
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Decode for user 1 
Decode for user 2 


zg [l] Decode for user K ĉx [l] 
Figure 6.14: A discrete memoryless uplink multi-user MIMO channel with the inputs 2, [l] for 
k=1,...,K and the output y[l] = D hkgp[l] + n[l] € C™, where l is a discrete-time index, 
hy € C™ is the channel vector from user k, and n[l] is the independent Gaussian receiver noise. 
The user signals are decoded in parallel by treating interference as noise. 


where [J] is the input signal from user k, for k = 1,..., K. The energy 
per symbol is P?!/B. We assume the use of Gaussian codebooks, such that 
xp [l] ~ Nc(0, PH! /B). The channel vector from user k is denoted by hy € C™, 
while n[J] ~ Nc(0, NoIys) is the independent receiver noise. The key difference 
is how the receiver processing will be carried out, namely, each user’s data is 
decoded separately and, potentially, in parallel. This setup is the same as in 
the previous section and is illustrated in Figure 6.14. 

Since the channel is memoryless, we can remove the time indices from 
(6.54). However, we must remember that the channel vectors are constant, 
while the signals and noise are random variables that take new independent 
realizations at every time instance. When considering an arbitrary user k at 
an arbitrary time, the received signal in (6.54) can be rewritten as 


K 
y= Bete + XO hyxj tn = hirr + nj (6.55) 
Desired signal i=1,i#k 


Interference 


where nj, = aren h,x; +n ~ Nc (0,C;,) is a colored noise term with the 
non-diagonal covariance matrix 


K ul 
7 H P; H 
Ck = E {n; (n) i= y B h,h; + NoIn. (6.56) 
i=1ixk 


Hence, if we refrain from decoding the other users’ interfering signals but treat 
them as extra noise, we can view (6.55) as a SIMO channel with the additive 
colored noise nj,. We recall from Section 3.2 that we can obtain an estimate 
&, of x, by projecting y onto a scalar value using a receive combining vector. 
This is a type of linear processing. In this section, we denote the receive 


438 Capacity of Multi-User MIMO Channels 


combining vector associated with user k as wą € C™ and we then obtain 
ĉk = Wey = wher, + wen. (6.57) 


We notice that (6.57) is effectively a memoryless SISO channel of the kind 
in (2.130) with the received signal y = %,, the effective channel h = w/h,, 
and the processed noise n = win}, ~ Nc(0,wiC;,w;). Hence, it follows from 
Corollary 2.1 that an achievable data rate (in bit/s) is 


py m 
Ry = Blog 1 E lwehel? _ P lwth]? 
7 Í k 7 K pul 
es Viz i¢k E [Weni]? + Nollwell? 


(6.58) 
This rate depends on the selection of the receive combining vector wg, which 
appears in the effective SNR term PE |with,|2/(wiC,we). More precisely, the 
direction of the combining vector will determine the SNR, while the length 
of the vector is immaterial since it affects the numerator and denominator 
equally. When having white noise, as in Section 3.2, the direction of the 
receive combining vector will not affect the noise variance w}}C;,w;, since Ck 
is a scaled identity matrix. In that case, the SNR is maximized by selecting 
Ww, as a vector parallel to the channel hz, which we previously called MRC. 
The situation changes when having the colored noise nj, because then the 
noise (or rather the interference) is stronger in some directions and weaker in 
others. For example, if we compute an eigendecomposition of Cz, eigenvectors 
associated with large eigenvalues represent strong directions, and eigenvectors 
associated with small eigenvalues represent weaker directions. Hence, the 
receive combining that maximizes the effective SNR must balance maximizing 
the numerator and minimizing the denominator. 

To identify the combining vector that maximizes the effective SNR in 
(6.58), we can divide the receive combining vector into two parts: one part 
that performs whitening of the noise and one that performs receive combining 
after the whitening. Recall from the previous section that the whitening of 
the colored noise is achieved by multiplying the received signal with cy a 
thus we set 

wr = CO, wy (6.59) 
where Wwy € CM is the effective combining vector after the whitening. There 
is a one-to-one mapping between wą and Wg, so we can make this assumption 
without risking any loss-of-optimality. By substituting (6.59) into (6.58), we 
obtain 


l 
Fr lwiC, hl? wC, Phy? 


= Blog 
wie, CC wr, Í 


Ry, = Bloga| 14 


(6.60) 
We can now notice that the variance of the whitened noise only depends 
on the squared norm ||w,||? and not on the direction of wą. Hence, we 
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can maximize the effective SNR in (6.60) by making wy, parallel to the 
effective channel co "hj; that is, applying MRC to the effective channel 


with wz = c” *h,. In conclusion, the effective SNR is maximized by selecting 
the receive combining vector as 


w= C ch, = C'h 
xw N 
Whitening MRC 
—1 
K ul 
=| © puh + Nol} hy. (6.61) 
i=1,i£k 


By substituting this vector into the rate expression in (6.58), we obtain 


Pe neo hy |? 
Ry = Blog, {14 i bk k "y 
uC, 'C,C, Thy 


Pu 
= Blogs h + Png") 


(6.62) 
This is a generalization of (6.35) to the case with an arbitrary number of 
users, which decode their signals separately by treating all interfering signals 
as colored noise. We call this a linear processing scheme since the receiver only 
computes the inner product between the received signal y and the combining 
vector wz before decoding the data. 

The rate-maximizing receive combining vector in (6.61) is referred to as 
LMMSE combining because we can also derive it by looking for the vector 
that minimizes the MSE E{|a;, — %,|?} between the transmitted signal and 
the estimate in (6.57). Such a problem was solved in Example 3.4, except that 
there were no user indices in that case. By substituting q = Pt!/B, h = hx, 
and C = Cx into (3.34), we obtain the MSE-minimizing combining vector 


—1 
Be ee Pi z 
Wk = hh +C hy, = C; hk, 6.63 
k B (3 kük k k P“h'C, th, +B k k ( ) 


which is equal to (6.61) except for the extra scaling factor Pt! / (PPh C7 hy + 
B). Strictly speaking, only the combining vector in (6.63) minimizes the MSE; 
however, both expressions are commonly referred to as LMMSE combining. 
The reason is that both vectors maximize the rate because the SINR only 
depends on the direction of the combining vector, not on its length. The rate 
expression originates from a mutual information expression that ignores the 
scaling because it implicitly assumes an optimal decoder, which will scale 
the received signal on its own, thereby compensating for whatever undesired 
scaling has been applied earlier. The preferred scaling depends on the decoding 
algorithm but is likely similar to the true LMMSE combining in (6.63). 
We can summarize the results with linear processing as follows. 
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Corollary 6.1. Consider the discrete memoryless uplink multi-user MIMO 
channel in Figure 6.14 where the K users have the respective inputs £k € C, 
k=1,...,K, and the output y € C™ given by 


K 
y =) hitr +n, (6.64) 
k=l 


where n ~ Nc(0, NoIm) is independent noise. Suppose h} € C™ is a constant 
vector known at the output, for k = 1,..., K. If each input signal is indepen- 
dently distributed as zp ~ Nc(0, PH! /B) and decoded separately by treating 
inter-user interference as noise, the largest achievable rate for user k is 
=i 
R, = Blogs | 1+ PřhE| XO PUhh?+BNoIy| by (6.65) 
i=1,itk 


and is achieved by using LMMSE combining. 


While it is possible to operate a multi-user MIMO system without utilizing 
SIC, the pertinent question is: how large is the performance loss? The achiev- 
able rate region with linear processing can be characterized by considering all 
the rate tuples (Ri,..., Rg) that can be obtained for different selections of 
the transmit powers: 


K -1 
R= fa, ..., Rg): Ry = B logs ( + Pens ( XO Pe hyh? + xt) ms) 
i=1,i¢k 


fork =1,...,K, for some P”,..., PY € ori} 


(6.66) 


To compare this rate region with the one achieved with non-linear pro- 
cessing, we continue the example from Figure 6.13. Recall that we considered 
K = 2 users with different channel qualities: aa = 10 and Pho = 5. Fig- 
ures 6.15(a) and 6.15(b) show the rate regions obtained with M = 4 and 
M = 8 antennas, respectively. The regions called “non-linear” are achieved 
using SIC and are the same as those in Figure 6.13, while the regions called 
“linear” are computed using (6.66). The boundary points are obtained by 
assigning the maximum power P to one of the users and varying the other 
user’s power from 0 to P. The corner point in the middle of the boundary is 
achieved by P?! = P}! = P. As expected, linear processing results in a smaller 
region than non-linear processing, but the difference reduces as we increase 
the number of antennas. For example, the loss in sum rate from using linear 
processing is 4% with M = 4 but only 0.4% with M = 8. The explanation is 
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(b) M = 8 antennas. 
Figure 6.15: Examples of uplink rate regions with K = 2 users when multi-user MIMO is used 


with either non-linear or linear processing. The region obtained in (6.66) is called “linear” and 
its convex hull is also shown. This is a continuation of the example from Figure 6.13. 
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the favorable propagation property discussed in the previous section; that is, 
the directions of the users’ channel vectors become increasingly different as M 
grows, making it possible to suppress interference using LMMSE combining 
without sacrificing much of the desired signal power. 


All the rate regions we have considered earlier in this chapter are convex 
sets, meaning that if we pick any two operating points in the region and draw 
a line between them, the line stays within the region. The pentagon shape in 
the two-user case with non-linear processing was achieved by drawing lines 
between the operating points in (6.42) and (6.43), which are achieved with 
different decoding orders, and the single-user capacities. This procedure was 
called time-sharing. If we allow time-sharing when using linear processing, we 
can replace the region in (6.66) with its convex hull. This way of expanding 
the region is also shown in Figure 6.15. We notice that the benefit of switching 
between different operating points over time shrinks as more antennas are 
added to the base station. Hence, having base stations with many antennas 
will increase the sum rate and simplify the system operation. 


To further elaborate on these properties, Figure 6.16 shows the sum rate in 
a setup with K = 4 users when the SNR is varied. The base station is equipped 
with a ULA with half-wavelength-spaced antennas, and the users have equal 
SNRs but have different azimuth angles-of-arrivals: —7/16, —7/32,0, +r /24. 
We compare the sum rates achieved with multi-user MIMO with non-linear 
and linear processing, as well as OMA/FDMA, where each user is allocated a 
quarter of the bandwidth. We will not specify the bandwidth in this example 
but plot the sum rate in bit/symbol to keep it general. Figure 6.16(a) shows the 
sum rate with M = 10 base station antennas. We notice that the multiplexing 
gain of min(M, K) = K results in a much higher sum rate when using 
multi-user MIMO than OMA. There is a substantial gap between linear and 
non-linear processing, which might be surprising since M = 10 antennas only 
resulted in a 3% sum-rate difference in the previous example. The reason for 
the broader gap in this example is that we have doubled the number of users. 
In Figure 6.15(b) we had the antenna-user ratio M/K = 4, and now we only 
have M/K = 10/4 = 2.5. However, if we also double the number of antennas, 
we obtain Figure 6.16(b), where the antenna-user ratio is 5. We notice that 
the performance gap between linear and non-linear processing is once again 
negligible, thanks to more favorable propagation that limits interference. 


In summary, multi-user MIMO with linear processing performs almost the 
same as its non-linear counterpart when the base station has around five times 
more antennas than the number of single-antenna users. This operating regime 
is often called Massive MIMO. A typical 5G NR mid-band configuration is 
M = 64 and 1 < K < 16 (depending on the traffic load), which results in 
antenna-user ratios of 4 to 64, for which linear processing works well. We 
refer to the textbook [1] for a deeper theoretical analysis of Massive MIMO, 
focusing on cellular networks with inter-cell interference and fading channels. 
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(b) M = 20 antennas. 


Figure 6.16: The sum rate in a multi-user MIMO system with K = 4 users and either non-linear 
or linear processing. All the users have the same SNR and LOS channels with different azimuth 
angles: —1/16, —7/32,0, +7 /24. OMA/FDMA, where the users are allocated equal fractions of 


the bandwidth, is shown as a reference and does not provide any multiplexing gain. 
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Example 6.5. What are the ergodic rates in a multi-user MIMO system with 
fast-fading channels, linear processing, and perfect CSI at the receiver? 

The ergodic capacity of SIMO channels with white noise was studied 
in Section 5.4.1, where it was concluded that it is the mean value of the 
conditional capacity achieved for a given channel realization. By following 
that principle, in a fast-fading multi-user scenario where hj,...,hx are 
realizations from stationary and ergodic random processes, the ergodic rate 
of user k is obtained by taking the mean value of (6.65): 


=] 


K 
Ry = BE ? log, (1+ re XO Pmhyh¥ + BNoly hs) . (6.67) 
i=1,i¢k 


Apart from the mean value, the rate region can be defined similarly. 


6.3.5 Alternative Linear Uplink Processing Schemes 


Although LMMSE combining is the rate-maximizing linear processing scheme, 
other schemes are commonly considered within the area of multi-user MIMO. 
There are situations where MRC works almost equally well as LMMSE 
combining, and there are other situations where a scheme called zero-forcing 
(ZF) is nearly optimal. These situations are connected with the SNR at which 
the system operates. If we consider the direction of the LMMSE combining 
vector in (6.63), we notice that 


a Ehh + MIm) h -1 
Wk Diag B uN; + 4VolM k ; (NoIm) hk — hk 
7 ul a =I -ijh 
Iwal (Eira a h;hř + Nol) hk (Now) hs| sl 
as Pt!,..., Pt! — 0. This is the same direction as when using MRC, which 


proves that LMMSE combining turns into MRC when all the users experience 
low SNRs. The intuition behind this result is that the inter-user interference 
will be much weaker than the noise in this situation; thus, every user is 
experiencing a SIMO channel with only receiver noise. If we substitute MRC 


with wMRC = h,,/||hx|| into the general rate expression in (6.58), we obtain 
bl? 
Be |h 
RMRC — Blog, {14 z T TTE (6.68) 


i=1,ižk B |b|? + No 


This is the achievable rate when using MRC in a multi-user MIMO system. 
H 2 

The interference term Ba in the denominator resembles the expression 

that appeared in the favorable propagation definition in (6.53); thus, MRC is 


also considered to work well in situations with very many antennas [90]. 
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To study the high-SNR regime, we will utilize the channel matrix notation 

F ul ul 
H = [hi,...,hx] € C™*X and the diagonal matrix Q = diag( 5, ..., Œ) € 
C*** containing the transmit powers. We assume that H"H € C*** is 
invertible, which is generally the case if M > K and the users are at physically 
different locations so that their channel vectors become linearly independent. 


Suppose we gather all the LMMSE combining vectors from (6.63) as the 


columns of a combining matrix W = [w1,..., wg] E€ C“**. By noticing that 
ul 
SE i P h;h? = HQH", we can express this matrix as 


W = (HQH" + Mlm) HQ 
= HQ (H"HQ + MIr) ' = HQQ (H"H + MQ!) ' 


> H (H"H) = W% (6.69) 
as PP,..., P2! > oo since then NoQTt —> 0 (a matrix with only zeros). The 


second equality in (6.69) follows from the matrix identity in (2.50).° The 
resulting combining scheme is called ZF because all the interference terms 
become zero when using it. This can be seen from the fact that (W“")#H = 
(H"H) + H"H = Ix, which implies that 


1 ifk=i, 


0 ifk Fi. (6.70) 


(w )"h; = l 
LMMSE combining turns into ZF at high SNR because the receiver noise 
becomes negligibly small under these conditions; thus, the rate is maxi- 
mized by removing all interference. The interference affecting user k exists 
in the (K — 1)-dimensional subspace of CM spanned by the channel vectors 
hy,..., hg_1, heyi,..., hx of the K —1 other users. This subspace is removed 
from the received signal when using ZF combining because wy, is selected 
orthogonally to it. If we substitute the ZF combining vector into the general 
rate expression in (6.58), we obtain 


Pu 
RZ = Blog, | 14 a (6.71) 
ax, [are], 


where the interference terms vanish thanks to (6.70) and [(H"H)~*]z, is 
the kth diagonal entry of (H"H)~!. This term is obtained by utilizing the 
fact that ||w2" ||? = [(W2")"W2F),, = [((H"H)~*],, when using ZF. All 
users should transmit at maximum power when ZF is used since there is no 
interference. 

There are other ways to achieve a high SNR than to increase the transmit 
powers, as was done in (6.69). In particular, we can increase the beamforming 


5This matrix identity is applied with A = HQ/No and B = H". 
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Figure 6.17: A geometrical interpretation of LMMSE, ZF, and MRC in a scenario with K = 3 
users and M = 3 antennas. The focus is on user 1, while users 2 and 3 cause interference. 


gain by adding more antennas. This will reduce the transmit power needed to 
enter the high-SNR regime. Hence, in the Massive MIMO regime where the 
base stations are equipped with many antennas compared to the number of 
users, ZF will likely perform similarly to LMMSE combining. 

Figure 6.17 provides a geometrical interpretation of LMMSE, ZF, and 
MRC in a scenario with K = 3 users and M = 3 antennas. The channel 
vectors h4, hg, hg point in three different directions in the three-dimensional 
vector space C*. We focus on receiving the signal from user 1; thus, all the 
interference will exist in the subspace spanned by hg and h3. This is the red- 
shaded plane in the figure. The desired signal is received along the dimension 
spanned by the channel vector hı. If MRC is used, the combining vector 
is parallel with hı to maximize the received signal power. If ZF is used, 
the combining vector is selected orthogonally to the subspace spanned by 
the hy and hg. If M > K, there are M — K + 1 > 1 dimensions free from 
interference, and ZF will collect the received signal power from all of them. 
LMMSE combining is a vector between the two extremes (MRC and ZF) and 
will move between them depending on the SNR. 

Figure 6.18 illustrates the low and high SNR behaviors of LMMSE combin- 
ing by continuing the example with K = 4 and M = 10 from Figure 6.16(a). 
We notice that MRC provides the same sum rate as LMMSE combining at low 
SNRs, where the system performance is noise-limited. In contrast, ZF provides 
the same sum rate as LMMSE combining at high SNRs, where the system 
performance is interference-limited. There is a large gap between LMMSE 
combining and the other schemes at intermediate SNRs. 


6.3. Uplink Communications 447 


40 


ies) 
© 
T 


NO 
© 


Sum rate [bit /symbol] 
S 


SNR [dB] 


Figure 6.18: The uplink sum rate in a multi-user MIMO system with K = 4 users, M = 10 
antennas, and different linear receive combining schemes. All users have the same SNR. 


In practical systems, some users might experience high SNRs while other 
users simultaneously experience low SNRs; thus, it is preferable to utilize 
LMMSE combining to identify the rate-maximizing tradeoff between inter- 
ference and noise suppression automatically. Nevertheless, there is a vast 
literature on rate analysis for MRC and ZF, mainly focused on the respective 
asymptotic regimes where these methods are optimal. The reason is that the 
rate expressions obtained with these schemes are analytically simpler than 
the ones obtained with LMMSE combining (e.g., there is no matrix inverse) 
and more amenable to mathematical analysis and extraction of insights. 


Example 6.6. When ZF combining is used, the effective SNR in (6.71) is 
proportional to 1/[(H"H)~1];,. How is this term distributed if the user 
channels are subject to i.i.d. Rayleigh fading: hy ~ Nc(Om, Bei)? 

The channel gain after combining is |(w")"h;|?/||w2* ||? =1/[(H"H)~ "Jar. 
It is hard to analyze it directly, so we start from the ZF principle in Figure 6.17: 
ZF projects hy, orthogonally to the interfering channels. We can create a 
unitary matrix A, = [Aite, Afee] in which the K — 1 columns of Aim e 
CMx(K-1) is an orthonormal basis of the subspace spanned by the K — 1 
interfering channels. The columns of Affe’ c CMx(M-K=+1) span the remaining 
M — (K — 1) interference-free dimensions. ZF combining reduces the user 
channel to (Afe*)"hy ~ Ne(Om-xK+1, Bela —x41)- Hence, 1/[((H"H)~"]nn = 
\|(Afe*)"h;,||?, which has a scaled x?(2(M — K + 1))-distribution with the 


PDF f(x) = o ae and mean value 6;,(M — K +1). This channel 


behaves the same as if we would remove K — 1 antennas to cancel interference. 
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6.3.6 Power Control for Max-Min Fairness 


The uplink transmit power coefficients P}! € [0, P], for k = 1,...,K can 
be selected to maximize a specific utility function. This is known as power 
control since it entails controlling the transmit power each user uses to achieve 
the preferred balance between their capacities. In this section, we consider 
power control for max-min fairness. We will introduce an efficient fixed-point 
algorithm that obtains the power coefficients that maximize the utility function 


u(Rı,..., Rg) = E Rp. (6.72) 


The max-min fairness problem was formulated in (6.4) as 


maximize Rx. (6.73) 
(Rio RK)ER Tn K} 


The achievable rate region R depends on the adopted receive combining 
scheme. When LMMSE combining is used, R is given by (6.66). However, for 
any linear processing scheme, we can express the rate region in the generic 
form 


R= {(Ri,.--, Rx): Re 2 Blog, (1 + SINR; (PH, eee) for k=1,...,K, 
for some Pi”, ..., PE € (0, PI}, (6.74) 


where the SINR for each user is a function of the transmit power coefficients 
P™,..., PR. The SINR of user k obtained by LMMSE combining is given in 
(6.65) as 


=1 


K 
SINRIMMSE( pul |., , PW) ee | XO P@hjh? + BNolm | bz. 
i=1,i¢k 


(6.75) 
Similarly, the SINRs of user k when using MRC or ZF are given in (6.68) and 
(6.71), respectively, as 


SIN MRC pul pul = P? be ||? 6.76 
Ry ( A a K) K puloh? hz |? BN. (6. ) 
ae 1,i4#k The II? T 0 
Pu 
SINRZF (PU... , Pw) = k 6.77 
k 1 K 


BNo (H+) 


Maximizing the minimum rate is equivalent to maximizing the minimum 
SINR among the users. Hence, the max-min fairness problem in (6.73) can be 
expressed for uplink multi-user MIMO with linear processing as 


maximize min , SINR. (PP; are) (6.78) 
Po... PH ke{l,..., 


subject to Pt! a, fork=1,...,K. 
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Since the lowest SINR determines the utility, there is no incentive to provide 
any user with a larger SINR than the others. This property is crucial in 
devising Algorithm 6.1 that finds the optimal solution. The algorithm starts 
from arbitrarily selected non-zero power coefficients P}! € (0, P] and sets a 
solution accuracy € > 0. In Step 3, each user that achieves an SINR larger 
than the current minimum SINR reduces its transmit power. Next, in Step 
4, all the power coefficients are scaled so that at least one user transmits at 
maximum power. These steps are repeated iteratively until a stopping criterion 
is satisfied. The difference between the maximum and minimum SINRs among 
the users goes to zero asymptotically; thus, the stopping criterion in Step 2 
identifies when the difference becomes smaller than e. The algorithm usually 
converges in less than ten iterations because Step 2 is a so-called fixed-point 
iteration that rapidly reduces the range of power values to consider. 

The algorithm has been stated as if any SINR expression can be utilized, 
but certain technical conditions must be satisfied; we refer to [91, Lem. 1, Th. 1] 
for the specific details. These conditions are satisfied when using LMMSE 
combining or MRC, in which case convergence to the optimal solution to 
(6.78) is guaranteed. The algorithm builds on Perron-Frobenius theory and 
interference functions covered in the textbooks [92], [93]. 


Example 6.7. Consider max-min fairness power control along with ZF com- 
bining. Show that an optimal solution is to use full power for all users. Find 
the corresponding max-min fair rate. 

The SINR expression in (6.77) with ZF is Pt!/(BNo|(H™H)~*Jxx). The 
SINR of user k is an increasing function of Pz!, but unaffected by the powers 
used by other users. Hence, the only way to improve a user’s rate is to increase 
its power, and it can be done without degrading for anyone. Hence, using full 
power Pt! = P for all users is one solution to the max-min fairness problem. 
The corresponding max-min fair rate is the minimum rate among the users: 


ye B 
pmax-min fair _ min B logs 14 J C (6.79) 
eae aS BN (H"H) "| 


kk 


The users typically get different rates when using full power and only the user 
with the largest value of [(H"H)~"];,, achieves exactly rmax-min fair, while the 
others achieve larger rates. We can alternatively ensure that all users get 
exactly the rate r”axmin fair by selecting the transmit powers as 


(HP) "Tex P (6.80) 


Pus i 
min;e{1,... K} (H"H) t] 


The non-uniqueness of the solution to the max-min fairness problem is why 
Algorithm 6.1 might not converge when using ZF combining. 
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Algorithm 6.1 Solution to the max-min fairness problem in (6.78). 


1: Initialization: Select arbitrary Pt! € (0, P], for k = 1,..., K, and the 
solution accuracy e€ > 0 

; ‘ .( pul ul) . _{pul ul 

2 while _ max, SINR; (PE... PR- min SINR: (PE, ..., PR) >e do 


i€{1,... K tE{1,..., 
o ENRE aP) 
; ul t€{1,...,K ul =: 
3: P < SINR; (PP, PR P? , for k=1,...,K 
ul P ul — 
4: PE $e pote kS yes A 
i€{1,...,K} 


5: end while 
6: Output: P™,... io 


Figure 6.19 demonstrates the max-min fairness solution obtained by Al- 
gorithm 6.1 in a system with K = 4 users. The setup is the same as in 
Figure 6.16(a) and Figure 6.18. Each user achieves an SNR of 10dB when 
using full power. In Figure 6.19(a), we show how the rates obtained by the 
four users vary with the iterations of the algorithm for M = 6 antennas 
and LMMSE combining. During the initial iterations, there are substantial 
rate variations between the users. However, as the algorithm proceeds, the 
four rates converge to a common value: the max-min fairness solution. The 
minimum rate among the users is gradually improved, but the convergence is 
not monotonic because a power reduction for some users will improve the rates 
of other users. In this example, 6-8 iterations are sufficient for convergence, 
but similar behavior can also be expected in other scenarios. 

Figure 6.19(b) shows the minimum rate among the K = 4 users for different 
numbers of antennas M. Both LMMSE combining and MRC are considered. 
In addition to the max-min fairness solutions obtained by Algorithm 6.1, the 
minimum rates achieved when using full power at every user are shown as 
references. In all the considered cases, the minimum rate increases with M, 
which highlights how the communication performance is improved by using 
more antennas. As expected, the max-min fairness power control provides 
larger minimum rates than full-power transmission for both combining schemes. 
With LMMSE combining, the gap between the max-min fairness solution and 
full-power transmission reduces with an increasing number of antennas and 
diminishes asymptotically. The reason is that LMMSE combining resembles 
ZF combining when M is larger, and Example 6.7 demonstrated that all users 
should then use full power. 
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(b) The minimum rate versus the number of antennas M. 


Figure 6.19: The max-min fairness solution obtained by Algorithm 6.1 with K = 4 users, using 
the setup from Figure 6.16(a). All the users have the same SNR of 10dB when using full power. 
In (a), the rates of the four users during the algorithm’s iterations are shown when LMMSE 
combining and M = 6 antennas are used. In (b), the minimum rate among the users is shown 
for a varying number of antennas M when using LMMSE and MRC combining. The minimum 
rate obtained using full power ‘pu = P for each user is shown as a reference. 
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6.4 Downlink Communications 


There are several ways to operate the downlink of a multi-user system. The 
channel gains are identical in uplink and downlink, but the resource allocation 
solutions differ for two main reasons. Firstly, the base station can divide its 
power flexibly between the users in the downlink, while each user has an 
individual power budget in the uplink. Secondly, interference has a different 
impact since each user receives it through the same channel as its desired 
signal in the downlink, while it is received through different user channels in 
the uplink. In line with the uplink analysis in Section 6.3, we will consider 
three types of downlink operation: OMA, NOMA, and multi-user MIMO. 


6.4.1 Orthogonal Multiple Access 


We begin by considering the scenario where a single-antenna base station 
transmits to K single-antenna user devices over a shared communication 
channel with the total bandwidth BHz. We will use FDMA, the OMA 
scheme where the bandwidth is divided orthogonally between the users. We 
let £% € [0,1] denote the bandwidth fraction allocated to user k, for k = 
1,..., K, and recall that these fractions can be selected arbitrarily as long as 
E&i + é2 +...+& < 1. Power allocation is another design dimension. The 
base station has a maximum transmit power denoted by P, which it can 
divide freely between the users. We let Pd! € [0, P] denote the power allocated 
to user k, for k = 1,..., K, and notice that these powers can be selected 
arbitrarily under the constraint P& + PË +... + PU < P. The channel gain 
of user k is denoted by 6; € [0,1]. We assume the users are ordered such that 
bı > b2 >... > Pr > 0, which can be done without loss of generality. 
Under these assumptions, user k experiences a point-to-point system with 
signal power Pd! and bandwidth £B, so the data rate in (6.8) becomes 


Pa pd! 
Rr (€, Pe") = &C (a) = Blogs ( + ze | bit/s, (6.81) 


where the notation R;,(€;, P3!) emphasizes that the rate is a function of the 
bandwidth and power allocated to the user. It is a strictly increasing function 
of both variables (as can be shown by computing the first-order derivatives), 
which shows the fundamental conflict in resource allocation: If we increase a 
specific user’s rate by allocating more power or bandwidth, we must take this 
power/bandwidth from other users that will experience rate reductions. 
Based on the rate expression in (6.81), we can define the rate region as 


Pa 
R= {ule Pè), ..., Re(€x, P&)) : Re(Ex, Pa") = £B log, ( + Zo 


ote, P? > 0,k =1,...,K, Eto tés LPE+.. +PEP) 
(6.82) 
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which is the set of all points (Ri(&, Pf’),...,RK«(Ex,P@)) that can be 
achieved by dividing the bandwidth and power between the users in dif- 
ferent permissible ways. There must be equality in the last two constraints to 
reach the Pareto boundary. However, this is not a sufficient condition because 
some points at the rate region’s interior also use all bandwidth and power. 
This situation differs from the uplink, where the bandwidth fractions are 
the only parameters varied in the rate region. For the downlink, even if a 
point uses all the bandwidth/power, improving a user’s rate might be possible 
without sacrificing others by jointly changing bandwidth and power values. 
For simulation purposes, the Pareto boundary can be identified by generating 
many points that belong to the rate region and calculating their convex hull 
(ie., the smallest convex outer boundary that encloses all the points).° 

Suppose we want to achieve the maximum sum rate. For any given set of 
transmit powers PAL, ..., P8, one can prove (by differentiation) that the sum 
rate is maximized by selecting the bandwidth fractions proportionally to the 
users’ received signal powers: 


Pe Br 
= 5K pda.’ 
Ži P; Bi 


By substituting this value into (6.81), the rate achieved by user k becomes 


PAB: PE Br < Pia, 
Ry k , På j = k G i | bit/s (6.84) 
(= P76, Da P? 6, > BNo 


and the sum rate becomes 


K dl K pdl 
P k Br dl P, i Bi : 
2 Be (=e pag t ) © = BN bit/s. (6.85) 
We can further maximize this expression by selecting the power allocation. 
The C(-) function is increasing with its argument 7%, P!6,/(BNo). The 
expression >, P92; is maximized by assigning all power to the user with the 
largest 5;, which is user 1 based on the assumed user ordering. Hence, the sum 
rate is maximized by P% = P and på = 0 for i = 2,..., K. The maximum 
sum rate equals the single-user capacity Cj" = Blog,(1 + P81/(BNo)) of 
user 1. It is beneficial from a sum-rate perspective only to serve one user, 
while any attempt to serve multiple users will result in a sum rate reduction 
(except if the served users have identical channel gains). Nevertheless, this 
must be done in practical multi-user systems because everyone must be served 
to some extent. This issue is different from the uplink, for which we noticed 
in Section 6.3.1 that the sum-rate-maximizing solution with FDMA is to split 


Ek (6.83) 


6A close approximation of the Pareto boundary is obtained by assuming that the power 
is allocated proportionally to the bandwidth: pa = P&,. The approximate boundary is then 
generated by varying &1,...,€« under the condition that 1 + 2 +... +x =1. 
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yı [l] 
xı [l] + z2[l] 
yell] 
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Figure 6.20: A discrete memoryless broadcast channel with K = 2 users and l denoting the 
discrete-time index. The input signal is the superposition of x,[l] and æx2[l], designated for user 
1 and user 2, respectively. The output at user k is yx [l] = hg (x1 [l] + x2[l]) + ng[l], where hy is 
the channel response and n,[l] is the independent complex Gaussian receiver noise, for k = 1, 2. 


the bandwidth proportionally to the channel gains, which results in non-zero 
rates for everyone. The reason for the difference is that the base station can 
divide the power arbitrarily between the users in the downlink, while each 
user has a fixed power budget in the uplink, so they must all be served to use 
all the available transmit power. 


6.4.2 Non-Orthogonal Multiple Access 


We can achieve a larger rate region by letting the users share all time-frequency 
resources instead of dividing them orthogonally using FDMA. This was 
demonstrated in Section 6.3.2 for the uplink, while this section considers the 
downlink counterpart. A downlink setup with inter-user interference is called 
a broadcast channel, and the transmission scheme is called NOMA. 

We will first describe the downlink NOMA scheme in the case of K = 2 
users that share a bandwidth of B Hz. The base station divides its maximum 
transmit power P between the users so that P? € [0, P] is the power assigned 
to user k, for k = 1,2. We consider a discrete memoryless broadcast channel 
where the base station transmits simultaneously to both users, as illustrated 
in Figure 6.20. The received signal at user k at the discrete time I is 


where 2 [I] is the input signal designated for user 1, x9[I] is the input signal 
meant for user 2, and ng|l] ~ Nc(0, No) is the independent receiver noise. The 
complex channel response to user k is denoted as hx and is assumed to be 
deterministic. Its magnitude square is denoted as 6; = |h,|?. 

Each user receives a superposition of both signals, but despite the mutual 
interference, it is possible to extract data if it is encoded correctly. Suppose 
the two input signals are independently complex Gaussian distributed so that 
rp{l] ~ Nc(0, Pd /B) for k = 1,2, where the symbol power is PË/B. We 
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further assume that the users are ordered such that 6, > 82. The received 
signal at user 2 in (6.86) can be expressed as 


where nh [l] = hoi [l] + n2[l] ~ Nc(0, P?82/B + No) is an effective noise term 
that is independent complex Gaussian distributed. Hence, it follows from 
Corollary 2.1 that an achievable rate of user 2 is 


Re=C PaPa bit/s (6.88) 
= A 1 á . 
? P B> + BNo 


This expression resembles the ones obtained in the uplink, but an essential 
difference is that the interference term P8, depends on the user’s own 
channel gain 62 and not the channel gain 81 of the other user. The reason is 
that the interfering downlink signal arrives through the same channel from 
the base station as the desired signal. 

If user 1 is informed of the channel coding used by user 2, it can try to 
decode the signal designated for user 2. The received signal in (6.86) can then 
be expressed as 

yill] = hy voll] + nil], (6.89) 


where ni [] = hivi(l] + ni[l] ~ Nc(0, P'6,/B + No) is the effective noise 
term. Hence, an achievable rate is 


ola | te (6.90) 
PAB, + BNo l 


This value is larger than or equal to (6.88) because we assumed that 81 > b2. 
Consequently, any rate achievable for user 2 is also achievable for user 1, in the 
sense that it can also decode it successfully. Although user 1 is not interested 
in the actual data designated for user 2 (the data can even be encrypted so 
only user 2 can extract its original meaning), it can decode it to apply SIC. 
By subtracting the decoded signal sequence {x2[l]} from the original received 
signal in (6.86), user 1 obtains 


This is a conventional SISO channel without interference, so the achievable 
rate for user 1 regarding its designated signal is 


_ n [PEB . 


This is the same rate as if the user was assigned the entire bandwidth in OMA, 
except that the power Pd! might be smaller than the maximum transmit 
power, depending on the base station’s selected power allocation. Since the 
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SIC procedure is utilized, we have obtained a non-linear receiver processing 
scheme where we both scale the received signal before decoding and subtract 
interference from previously decoded signals. When we utilized SIC for uplink 
NOMA in Section 6.3.2, we could order the users arbitrarily. The situation 
is different in the downlink because the user with the strongest channel can 
decode signals encoded for users with weaker channels, but not vice versa. 
This makes the rate region easier to characterize since all points are obtained 
by varying the power allocation: 


Pg, PI! Ba 
= ee og eae 0< R< C| — 2 
R f(r, Ro) osm sc( 2 rere | pale, SBN | 


for some PË, P} > 0, PË + pal < Ph (6.93) 


The Pareto boundary is obtained whenever the maximum power is utilized: 


ł 6 1 bo dl dl dl dl 
OR. (c ( BNo ,C Pdl Bo BNo Lira S aye L 2 


It follows from (6.88) and (6.92) that the sum rate with NOMA is 


PB P! Ba 
Rı + Rə = B1 1 1 Bl Ee e 
1+ ho o ( + BNo + D logs + pags + BN, 


pal BN pal 4 pd BN 
= Bros ( i b1 + 2 - Blogs (1 t+ Py )B2+ r) 


BNo P62 + BNo 

(P! + Ps!) Be P28, + BNo 
=B] 14 - Bl L |, (6.94 
o ( BNo 082 | Pag, + BNo Ia 


where the last equality follows from swapping the numerators between the two 
logarithms. The first term depends on P + P$, but not on how the power 
is allocated between the users, so it is maximized when using the maximum 
power P. The second term is an increasing function of P@! but independent 
of PA, thus the sum rate is maximized by setting pal = P and PẸ = 0. 
The resulting maximum sum rate is B loga(1 + P6,/(BNo)) = Cy", which is 
the single-user capacity of user 1. This is the same maximum value as with 
FDMA; thus, NOMA cannot improve the sum rate compared to OMA and 
reduces to single-user transmission at the optimal point. The rate region in 
(6.93) is anyway larger than the region in (6.82) obtained by FDMA. 

The rate regions with NOMA and FDMA are exemplified in Figure 6.21(a) 
with B = 10 MHz. This is the downlink counterpart to the two-user scenario 
considered in Figure 6.9. The two users have unequal channel qualities that 
become ++- = 10 and -282- = 5 if an equal power allocation of P/2 per user 


2BNo 2BNo 
is used. The rate region with FDMA has a nearly linear Pareto boundary. 
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(a) Comparison of downlink NOMA with OMA based on FDMA. 
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(b) Comparison of downlink NOMA and uplink NOMA. 


Figure 6.21: Example of downlink rate regions for K = 2 users with different channel qualities. 
This is a continuation of the example in Figure 6.9. 
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The boundary with NOMA is more curved and results in a slightly larger 
region, demonstrating that it is easier to balance user performance when 
using NOMA. The sum rate is maximized at the single-user capacity point 
(43.9,0) Mbit/s that both access schemes can achieve. Max-min fairness is 
achieved at (20.7, 20.7) Mbit/s with NOMA and at (19.4, 19.4) Mbit/s with 
FDMA, so NOMA increases this rate point by 7%. In other words, fairness is 
achieved with a smaller sum-rate reduction when using NOMA. 

Figure 6.21(b) compares the rate region with downlink NOMA and the 
rate region with uplink NOMA, which was previously shown in Figure 6.9. The 
channel gains and total transmit power are the same, but the downlink region is 
nevertheless larger. This is thanks to the flexible downlink power allocation that 
can divide the power unequally between the users. This benefit is particularly 
large close to the single-user capacity points. The Pareto boundaries intersect 
in one point, achieved by equal downlink power allocation. 


Example 6.8. What is the shape of the rate region with NOMA if 6, = 6? 
The users could have been ordered arbitrarily in this situation, which 
implies that the rate region is symmetric. The sum rate in (6.94) reduces to 


dl dl P 
Ra + Re = Blog, (14 (Pit Fs )iis < Blog, (1+ Ba , (6.95 
2 2 


BNo 


where the upper bound is achieved for any power allocation that satisfies 
Pal + Ps! = P. Hence, the Pareto boundary is the straight line between the 
single-user points (C{",0) and (0,C§"). Any point on that line is achieved by 
one specific selection of P? and PË = P — PA, 


The rate region with NOMA for K > 2 users can be derived by following 
the same principles as with two users. The received signal at user k at the 
discrete time l is 


nll] = hk Ž, xil] + ngil, (6.96) 


where «;{[I] ~ Nc(0, P? /B) is the independent data signal transmitted to 
user i with the power P!, for i = 1,...,K, and ng[l] ~ Nc(0, No) is the 


independent receiver noise. We assume the users are ordered such that 61 > 
b2 >... > Br > 0. User k can then decode the signals intended for users 


k+1,..., (in descending order) because it has a stronger channel than 
them. By subtracting those interfering signals before decoding its desired 
signal, user k will only be exposed to interference from users 1,...,k — 1: 
K k-1 
yell] -he XO [l = hrst hed ail] tne]. (6.97) 
i=k+1 i=1 


———— 
~Ne (0,012: Pi'bx/B+No) 
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By treating the last terms as an effective noise term, an achievable rate for 
user k becomes 


Pe Be 
R= C| =] — bit/s. (6.98) 
Sa Pbk + BNo 


Theorem 6.3. Consider a K-user discrete memoryless broadcast channel with 
the input zı +... + £g E€ C and the outputs y,...,yx E C given by 


K 
Ma = Ine X a FF Mies k K (6.99) 

g= 
where ng ~ Nc(0, No) is independent noise and hj,...,hx € C are con- 


stant channel coefficients known at the output. Suppose the input distribu- 
tions are feasible whenever E{|xp|?} < P/B, where the transmit powers 
PA ee Wess > 0 satisfy Paty, 5 + Pd < P, P denotes the maximum transmit 
power, and B is the bandwidth (and symbol rate). If Rẹ denotes the rate of 
user k and 6; = |hg|?, the capacity region is given by 


Pip 
R= (ieee) OE R eG =i k 
we se; BNO 


for some P®,..., PE > o.Fil+...+ Pit < Ph 


(6.100) 


Figure 6.22 exemplifies the rate region achieved with NOMA for K = 3 
users. The considered setup is a downlink counterpart of Figure 6.10 with 
apne = 10, Bx: = §, ph = 2.5, and B = 10 MHz. The shape of the outer 
boundary is indicated by a collection of curved lines that lie on the Pareto 
boundary. The lines are generated by fixing one of the transmit powers and 
then varying the others such that PË + Ps! + Ps! = P. Each point on the 
boundary represents a specific tradeoff between the users’ performance. The 


maximum sum rate is achieved at the single-user capacity point (Cf", 0,0). 


6.4.3 Downlink Multi-User MIMO with Non-Linear Processing 


The reason that NOMA cannot increase the sum rate compared with FDMA 
is that all the transmitted signals propagate in the same way because the 
base station utilizes a single antenna. We have M = 1 transmit antenna and 
a total of K receive antennas, so the multiplexing gain of the corresponding 
point-to-point MIMO channel is min(M, K) = M = 1. This means that the 
sum rate in the NOMA setup cannot surpass the capacity of the point-to-point 
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Figure 6.22: Example of a downlink rate region for K = 3 users when using NOMA. 


MIMO setup.’ Hence, it is unsurprising that the sum capacity with NOMA 
is limited by what can be achieved when serving one user at a time using 
FDMA. To serve K users efficiently in the downlink, the base station should 
have at least M > K antennas so that a multiplexing gain of min(M, K) = K 
is theoretically achievable. This will enable the base station to send the K 
signals with substantially different spatial directivity so that the inter-user 
interference can be managed through clever processing/precoding at the 
transmitter side. This brings us to a downlink multi-user MIMO setup, where 
the multiple inputs are the M transmit antennas and the multiple outputs 
are the K user antennas; that is, the MIMO terminology is utilized even if 
each user device only has a single antenna. 

We begin by considering a discrete memoryless channel with a transmitting 
base station equipped with M > 2 antennas and K = 2 receiving single- 
antenna user devices. Both users are served simultaneously over a bandwidth 
of BHz. The maximum transmit power P is divided between the users, such 
that P& € [0, P] is the power allocated to user k and P? + PẸ}! < P. Moreover, 
each user is assigned a unit-norm precoding vector p E€ CM. The received 
signal y,[1] € C at user k at the discrete time l is 


yk[l] = hy (pivi[l] + p2x2[l]) + nkl], (6.101) 


7The essential difference between the broadcast channel and point-to-point MIMO channel 
is that the receiving users cannot decode signals cooperatively in the former setup, which is a 
restriction that can only lower the sum capacity. 
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where x;f[l] is the date signal designated for user i, for i = 1,2. The symbol 
power is P/B and we assume Gaussian codebooks: 2;[I] ~ Nc(0, P?/B). 
The channel vector to user k is denoted by hy € C™, while nyl] ~ Nc(0, No) 
is independent receiver noise. This channel is illustrated in Figure 6.23. 

The uplink multi-user MIMO capacity region was obtained in Section 6.3.3 
by utilizing two processing components: SIC and LMMSE combining. The 
capacity-achieving downlink operation has counterparts to these components, 
but another interference cancellation technique must replace SIC. In the 
downlink NOMA scenario considered in the last section, the users were 
ordered based on their channel gains as |h1|? > |h2|?, and we noticed that the 
first user can always decode the second user’s signal thanks to its stronger 
channel. This enabled SIC to be used to achieve the capacity. However, the 
same principle cannot be applied with M > 2 antennas because even if we 
order the users so that ||hi||? > ||ho||?, the precoding vectors can be selected 
so that neither user can decode the other user’s signal.® 


Example 6.9. Consider a scenario with M = 2 antennas where the channels 


dl dl 
are hı = [1,1]? and hə = [1,0]7. Suppose that = = ~ = 1 and MRT is 


used for precoding. Can user 1 decode the signal meant for user 2? 
If user 2 treats the interfering signal as noise, it achieves the rate 


____ Psy pe? 
log, | 1-4 TIES 
ei |h3pıl + BNo 


1 
= log, (1 + sc) = 0.74 bit/symbol, 


(6.102) 
because pı = zhi and p2 = hš when MRT is used. User 1 can only decode 
this signal if it is encoded at a rate that is lower than or equal to 


PË|hFp2]? 1 
i ie a e =] 1 ) x 0.42 bit bol. 
oa ( Pé|hi pi)? + BNo ose ( ees ne 


(6.103) 
Since 0.42 < 0.74, user 1 cannot decode the signal designated for user 2, 
even if it has a stronger channel (i.e., |||? = 2 and ||h2||? = 1). A similar 
computation will show that user 2 cannot decode the signal designated for user 
1, so none of them can apply SIC. The multi-antenna precoding creates this 
effect because it reduces inter-user interference compared to the single-antenna 
case considered in NOMA. Another contributing factor to this result is that 
there is no unique ordering of vectors from strong to weak. 


Instead of relying on interference cancellation at the receiving users, the 
base station can arrange a kind of interference subtraction before the data 


8Qne can create special cases, called degraded broadcast channels, where the SIC procedure 
from Section 6.4.2 can also be utilized with multiple transmit antennas. One example is when 
hı and hg are equal except for a scaling factor. However, these cases are unlikely to occur in 
practical scenarios. 
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Figure 6.23: A discrete memoryless downlink multi-user MIMO channel with M transmit 
antennas and K = 2 receiving single-antenna users. The two input signals are x[J] and æ2[l], 
where / is a discrete-time index. The output at user k is yx [l] = hy (pivi(!] + p2z2[l]) +np[l] for 
k = 1,2, where hy, hg are the channel vectors, pi, p2 are the precoding vectors, and nı [l], nafl] 
are the independent complex Gaussian receiver noise terms. 


signals leave its antennas. This approach builds on an information-theoretic 
result from [94], which considers the SISO channel shown in Figure 6.24, 
which has the unique characteristic that an extra interfering signal 1 is added 
to the received signal. Suppose this interfering signal is random but known 
to the transmitter. In that case, the channel capacity is the same as if the 
interference was not there—even if the receiver is unaware of the realization 
of the interference. 


Theorem 6.4. Consider the discrete memoryless channel in Figure 6.24 with 
input x € C and output y € C given by 


y=h-r+itn, (6.104) 


where n ~ Nc(0, No) is independent noise and ¿ ~ Nc(0, P,) is an interfering 
signal that is only known at the transmitter. Suppose the input distribution 
is feasible whenever the symbol power satisfies E{|2|?} < q and h € C is a 
known constant. The channel capacity is 


= qh?) |. 
C = logs (1+ TN. bit /symbol (6.105) 
0 


and is achieved when the input is distributed as x ~ Nc(0, q). 


This somewhat surprising result is known as dirty paper coding (DPC) due 
to Max Costa’s analogy in [94] between the proposed transmission scheme 
and how one can write a message on a paper that contains dirt spots. The 
paper represents the channel and the dirt is the interference that the trans- 
mitter/writer knows beforehand. The transmitter can write the message by 
adding ink so that the combination of ink and dirt becomes a message that 
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x[i] — POO yil] 


Figure 6.24: A discrete memoryless SISO channel with input z[l] and output y[/] = h - x[l] + 
[l] + nfl], where l is a discrete-time index, h is the channel response, ¿[l] is an interfering signal 
that is known at the transmitter, and n|] is the independent Gaussian receiver noise. 


the receiver/reader can understand without distinguishing ink from dirt. The 
main point is to utilize the dirt, not combat it. 

We will not detail the proof of Theorem 6.4, which can be found in [94], or 
convey the precise implementation details, but only describe the fundamental 
principles of DPC. The transmitter and receiver take the codebook that could 
have been used to achieve the capacity in the absence of the interfering signal 
ue and augment it. Figure 6.25 illustrates this augmentation, where the black 
points around the origin represent the original Gaussian codebook. There 
are six copies of this codebook, highlighted with different colors, that are 
sufficiently far away not to overlap but sufficiently close not to create holes 
in between. For a given data symbol x and (normalized) interfering signal 
£, the transmitter determines which copy of x is closest to ; in the complex 
plane. We denote the closest copy by @, as illustrated in the figure. Instead of 
transmitting č directly, we transmit a = © — ;, because the receiver will then 
observe ha +. +n = h& + n which is free from interference (but still contains 
noise). To make DPC efficient, the distance between the copied codebooks 
must be selected precisely so that a ~ Nc(0,q). Due to the augmentation, 
the receiver has many more potential constellation points to consider during 
signal detection, which increases the decoding complexity, but the design does 
not create extra decoding errors since the copies are relatively far apart. 


In the remainder of this section, we will utilize DPC to characterize the 
capacity region. When there are multiple users, their data signals must be 
encoded sequentially because DPC can only eliminate interference from already 
encoded signals. When there are K = 2 users, the user whose signal is encoded 
first must treat interference as noise, while the other user can benefit from 
DPC to get an interference-free transmission. The iterative procedure makes 
this a non-linear processing scheme. For every choice of precoding vectors 
and encoding order, we can generate the outer boundary of the rate region 
by varying the power allocation so that P + P$! < P. The capacity region 
will then be the union of all these rate regions; however, it is computationally 
hard to generate the region in this way. Searching over different encoding 
orders and power allocations is manageable, as shown earlier in this chapter, 


464 Capacity of Multi-User MIMO Channels 


A Imaginary part 


* 
* * * 
. „> * * ye . 
+a * yt * 
$ 
* van te . tak E 
4” rh CLON Et a, > 2? pe eet 8, ad 
. ae a Fes Ae hrat . 
. tenh Sartyr A EEL tex . 
A Tea tat ne eee tah nie ë 
are” Ea ae kee 
. * 
ates “ie! * wre aA ten * 
fee ee ae had indy peas 
oe rR wah td re iu 
s gh, £ g afa 
th OTA, T vate . 
+ TR . Sat ae feet OS at me 
ee tt ft. Roe yee tte 
PE cas Bet yee 
* * * 
* = * = * 
. * ** * he « * * L * 
. ue * . ss * am ee . 
nr L'r eana 
+ 
. ae a . Salt ae . zar DaT 
y ple w erbte e o * + gts r ni Q * -o aiet ane ad 
+ 


F» Fa 

in te A 

eaoat Real part 
- cage s i p 


* . . . 
. ` . A . 
«* * * + * * ** 
. i * * ue . 
* at * li r * 
i 4 3 se ye 
w se Son | a* lhe 2 Sen, Eoo 
+ * + * 
a te ae a E * a atten PEs 35 * 
Se eM ie vee arte MS ed 
ee as ee ie re et ie a 
a § tenets ee AN he 
* fg Bae + ate eh a. pi 
"3 ae “"? A 
à ie tie, tt Ped nets, et $ 
7 wet gs eS a Pe a | seg. . 
. A . 
E A T a a SP 
d ar 2 ber e * .. x Gat A 
a pt’, aff ges 
tet ee Bat gee 
* * * . 
. . 
el - i 


Figure 6.25: Illustration of the main principle of DPC. The original codebook is augmented 
with the colored codeword copies to fill the complex plane. When the data symbol z is to be 
transmitted, and the interfering signal is ų¿, the transmitter identifies the copy of x closest to t/h, 
denoted by %. The transmitted signal is then selected as a = č — ¿/h so that the summation of a 
and the interfering signal becomes %. The interference thereby becomes invisible to the receiver. 


but there are too many ways of selecting the precoding vectors. Hence, we 
will look deeper into this issue to identify the optimal precoding when the 
other parameters have been selected. 


If the signal designated for user 1 is encoded first, the user’s received signal 
in (6.101) will be affected by interference from the signal meant for user 2. 


dl 
By treating this signal as additional noise with the power Pe lhT py”, the 
achievable rate becomes 


Pep; pil? 


Ri =C 
; P3'|hfp2|? + BNo 


(6.106) 


When the signal to user 1 has been determined, the transmission to user 2 can 
be encoded using DPC. It then follows from Theorem 6.4 that the achievable 
rate will be the same as in the absence of interference; that is, as if the received 
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signal was y2[l] = h3p2z2|l] + nell]. The resulting rate for user 2 is 


Pal ht 2 
R=C (er) (6.107) 


The rates in (6.106) and (6.107) can be computed for any precoding vectors. 
It is challenging to select the precoding because pz affects the numerator of 
the SINR of user 2 and the denominator of the SINR of user 1. Hence, we 
must make a tradeoff between maximizing the received signal power of user 2 
and limiting the interference caused to user 1. This is a crucial difference from 
the uplink, where each combining vector only affected its designated user and 
could be optimized without making tradeoffs. Interestingly, a mathematical 
connection between the uplink and downlink can be utilized to identify the 
optimal precoding. This is known as the uplink-downlink duality. 

Consider the dual uplink scenario where the same two users send their 
signals to the base station using the transmit powers Pi! and P}!, respectively. 
The channel vectors hı, hz are the same as before, and the receive combining 
vectors are selected based on the precoding vectors as wı = pj and w2 = p>. 
If the signal from user 2 is decoded first and SIC is utilized to remove its 
interference before decoding the signal from user 1, the same approach as in 
Section 6.3.3 can be used to compute the achievable uplink rates as 


P}'|pth,|? Pu |pTh;|? 
RY = 0| = | =0 | =+ |, 6.108 
( ||P ||? B.No BNo l ) 
p% Th 2 p% Th 2 
Př |pzhi|? + ||p2||?BNo P#|pzhi|? + BNo 


where we simplified the expressions by utilizing that the precoding vectors 
have unit norm. Suppose we want user k to achieve a specific rate C (yp) in 
both the uplink and downlink, for k = 1,2, where y% > 0 denotes the desired 
SINR value. We can find the downlink transmit powers that lead to these 
rates by solving the equation 


Pentel? [pTp:|? [pT pel? 
P3|bT p2|°+BNo 2 = NBN BN; oe =|) (6.110) 
P$"|hF po|” 0 |p2p2|” | LPs! ie 
BNo = 72 72BNo 
ZIN 


This is a linear system of equations, so the solution is [P@', PP]® = I-11, 1]7 
if the matrix I is invertible.? The corresponding equations for the dual uplink 


9The transmit powers must be positive and this condition is only satisfied for some invertible 
matrices I, which showcases that some rate combinations can never be achieved. The necessary 
and sufficient condition is that I — Vaiagl has eigenvalues in the range (—1, 1) [95], where Tajag 
is the diagonal matrix having the same diagonal entries as I. In this book, we only want to 
establish that rates achievable in the uplink with non-zero power coefficients are also achievable 
in the downlink with non-zero power coefficients and vice versa, which implies that the condition 
is automatically satisfied. 
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transmit powers P}, PY}! are 


Pi |prhi|? [pth |? 
BN 7M 0 PH 1 
9, mBN 1 — 
P3'|pT ho| = > [ofhi] [p7h]? pa H . (6.111) 
PY"|pThi |" +BNo BNo 72BNo 
— SS 


=TT 


The only difference from the downlink is that the uplink equation system 
contains I7 instead of I. This showcases that the downlink and uplink SINR 
expressions contain the same kind of interference terms but at different places: 
the interference term |p$h,|? = |h7p2|? affects user 1 in the downlink and 
user 2 in the uplink. Due to this asymmetry, the values of PY, P$! that deliver 
the desired downlink rates are generally different from the values of P?!, P}! 
that deliver the same uplink rates. However, the values are tightly related 
because 


da par (eee alee) ee. sat a 
oy tes =f] [e = {pp} Tlp] =|pe} pp t> 
(6.112) 


where the second and third equalities follow from (6.110) and (6.111), respec- 
tively. Hence, when the precoding and combining vectors are identical, the 
same data rates are achievable in the downlink and uplink using the same total 
transmit power but allocating it differently between the users. If the uplink 
powers are known, the corresponding downlink powers can be computed as 


Éi -pir [Phl l (6.113) 


This result also has implications for the precoding selection. We know from 
Section 6.3.3 that the uplink rate in (6.108) is maximized by pı = hł/||hı ||, 
which is MRC. Moreover, we know that the uplink rate in (6.109) is maximized 
by the LMMSE combining in (6.36). By revising the notation, including 
complex conjugates, and normalizing the expression to have unit norm, we 
obtain the SINR-maximizing precoding vector 


(P"hth? + BNolm) ` hš 


2 = 7 = (6.114) 
| (Pethjh? + BNol) * h3 


p 


The uplink-downlink duality dictates that the same rate points (C(y1), C(72)) 
are achievable in both uplink and downlink; thus, if some points can only be 
reached by the optimal uplink combining, we must use the complex conjugates 
of the same vectors for downlink precoding to reach those points. We have 
thereby established a mechanism to transform optimal uplink combining 
vectors into downlink precoding vectors that are optimal for reaching the 
same rate point, thereby resolving the complicated tradeoff. The intuition is 
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that the spatial direction (in the M-dimensional vector space) in which the 
base station should listen to the uplink signal from user k is the same as it 
should transmit back to user k in the downlink. 

The uplink rate region with two users was characterized in (6.44) for the 
case when the users use the same transmit power. When we instead assign 
arbitrary uplink powers P}, P}", we can generalize the expression as 


pul h 2 Pu h 2 


ul ul 


P. H P. H 
Rı + Rə < Blogs [us (1 BN | BN; nn) ) \ 
(6.115) 


This scenario is typically called the virtual dual uplink since we cannot allocate 
the total uplink power arbitrarily between the users in practice. However, 
the power allocation feature exists in the downlink, and the uplink-downlink 
duality connects the downlink scenario to these hypothetical/virtual dual 
uplink scenarios with arbitrary power splits between the users. In particular, 
the downlink rate region that is achievable in multi-user MIMO setups with 
DPC is the union of all conceivable virtual uplink rate regions: 


l 
R= U Ripa, py- (6.116) 
Pu Prl: pel Py < P 


This is the largest achievable rate region of the downlink multi-user MIMO 
channel, which is formally proved in [96], [97], so we call it the capacity region. 

The downlink rate region with M = 4 and B = 10 MHz is exemplified in 
Figure 6.26, as a continuation to the NOMA scenario in Figure 6.21 with 
the LOS channel model in (4.23) where the UEs have the azimuth angles 
pı = —7/20 and p2 = 7/20. The users have unequal channel qualities that 
become a = 10 and a = 5 under an equal power allocation of P/2 
per user. Nine virtual uplink regions, obtained by different power splits, 
are illustrated using blue-dotted lines. These regions have the pentagonal 
shape typical for the uplink. The downlink region is the union of all such 
virtual uplink regions; thus, it is larger than any given uplink region thanks 
to the ability to allocate downlink power arbitrarily between users. The 
Pareto boundary has a smoother shape where each point is obtained from one 
specific uplink region, following from the uplink-downlink duality. Suppose 
we start from a point in the virtual uplink. In that case, we can obtain the 
corresponding downlink transmission method by using the same combining 
vectors as precoding vectors, transforming the uplink powers into downlink 
powers using (6.113), and encoding the downlink signals using DPC in the 
opposite order as the uplink signals are decoded using SIC. 

The operating points that provide the maximum sum rate and max-min 
fairness are indicated in Figure 6.26, and there are multiple points in the 
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Figure 6.26: Example of the downlink rate region for K = 2 users with multi-user MIMO, 

DPC, and M = 4 antennas. The region is the union of all possible dual uplink rate regions with 

different transmit power divisions between the users. This is a continuation of the example in 


Figure 6.21. Many red-marked points achieve the maximum sum rate, while a single point (red 
star) gives max-min fairness. 


former case. The sum rate with the optimal receive combining in the virtual 
uplink can be computed using (6.43) with arbitrary uplink powers P}, PX! as 


R” + Ru pan C PP |b ||? +C Py (Ph hë + BN I ) oh 
1 > BNo 2 2 1 141 oM 2 
Pu Pu 
= Bl Im + = mh? + ——hoh? | |. 6.117 
082 (a ( M BNo 101 BN, 2 )) ( ) 


This expression is symmetric with respect to the user indices, which means 
that the same sum rate can be achieved irrespective of which user signal is 
decoded first in the virtual uplink (i.e., encoded last in the downlink). The 
maximum downlink sum rate is obtained by maximizing this expression with 
respect to the uplink powers [98]: 


Pu Pu 
imi Bl det | Im + ——hih? + ——həhë | |. (6.118 
PY P:P PSP o ( i (1 BNo  ' BN” s) eats) 


There is no closed-form solution to this problem, but the objective function is 
a concave function of the power variables. Hence, this is a convex optimization 
problem that can be solved using general-purpose convex solvers [99]. 
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The derivation of the rate region can be extended to the general case of 
K > 2. The uplink-downlink duality is then generalized to consider a virtual 
uplink scenario of the same kind as in Theorem 6.2 but with different powers 
among the users. The rate region is the union of all such virtual uplink regions. 
Each point in the downlink region is achieved by encoding the user signals 
sequentially and applying DPC to protect each user from interference from 
the previously encoded signals. We can summarize the result as follows. 


Theorem 6.5. Consider a K-user discrete memoryless downlink multi-user 
MIMO channel with the input pjz; +... + pgzrg € C™, where zp € C is 
the input signal designated for user k and pz, € C™ is the corresponding 


unit-norm precoding vector. The outputs y1,...,yx €E C at the users are 
given by 
K 
ip = hy pie, = pacea (6.119) 
{=l 
where ng ~ Nc(0, No) is independent noise and h:,..., hpg € C™ are constant 


channel vectors eee at the output. Suppose the input diseabwions are feasi- 
ble whenever eee SI 
satisfy P +. ee H Z P. IP os the maximum fee power, and B 
is the aa a symbol rate). The capacity region is given by 


R= U Rew... Pp» (6.120) 
PAi a Buk: 
P+... +PY<P 


which is the union of the virtual uplink regions 


rk 
Rpa Py {m ai) A aloe, (a (1 +5 BN, 15T) 


kek kek 


for all K C {1,..., K}, Rp 2 0 for all e}. (6.121) 


The parameterization in (6.121) reveals the expression for the sum rate, 


obtained with K = {1,..., K}. Hence, we can maximize the sum rate as 
pu 
maximize Blogs | det | Im + D neh (6.122) 
PH.. PE 


Pry... .+P8<P 


This is a generalization of the two-user problem in (6.118), and it remains to 
be a convex optimization problem that lacks a closed-form solution [98], but 
can be solved efficiently using any software for solving such problems. 

The benefit of increasing the number of antennas is illustrated in Figure 6.27 
by revisiting the example from Figures 6.21 and 6.26. The rate regions with 
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Figure 6.27: Examples of downlink rate regions for K = 2 users with multi-user MIMO and a 
varying number of antennas M, where M = 1 corresponds to NOMA. This is a continuation of 
the example in Figures 6.21 and 6.26. The dotted curves represent the hypothetical cases where 
inter-user interference is neglected. 


NOMA (i.e., M = 1) and multi-user MIMO with M = 4 or M = 8 are 
compared. As the beamforming gain increases with M, the single-user capacity 
points are shifted towards larger values along the two axes. The Pareto 
boundaries also become increasingly curved, demonstrating how interference 
becomes less of an issue thanks to favorable propagation, and the sum rate is 
substantially larger than the single-user capacities. The dotted curves illustrate 
the three hypothetical rate regions obtained without inter-user interference 
to emphasize this effect further. The difference between the hypothetical 
interference-free case and the actual Pareto boundary is large for M = 1, but 
tiny for M = 8. The reason for the curvature in the interference-free cases is 
the need to divide the total transmit power between the users. 


6.4.4 Downlink Multi-User MIMO with Linear Processing 


The last section demonstrated how DPC could be utilized in downlink multi- 
user MIMO to achieve all Pareto optimal operating points in the rate region. 
Unfortunately, this non-linear encoding scheme has some practical drawbacks. 
Firstly, the sequential encoding of the users’ signals leads to an encoding 
delay that grows proportionally to the number of users, and the encoding 
complexity per user signal is also increased. Secondly, the decoding complexity 
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at the user devices is increased as the codebook is augmented with many 
codebook copies. Thirdly, the individual users’ data rates must be selected 
jointly based on the encoding order, which makes the operation less flexible. 
Finally, the transmitter might have imperfect channel knowledge in practice, 
so the interference can only be partially removed. In this section, we will 
analyze downlink multi-user MIMO without DPC, where each user is subject 
to interference from all other users and treats it as additional noise. This is 
referred to as linear processing and requires the precoding and power allocation 
to be fine-tuned to suppress inter-user interference further. 

We consider a K-user downlink multi-user MIMO channel of the kind 
defined in Theorem 6.5. The received signal at user k is 


K 
Yk =h Y piait+m, k=1,...,K, (6.123) 

i=1 
where x; ~ Ne(0, P*'/B) is the data signal sent to user i, P*! > 0 is the 
allocated iiapsmit power, p; € C™ is the associated unit-norm precoding 
vector, and ng ~ Nc(0, No) is independent noise. The complete received signal 


y = |y1,---,yx]" for all users is expressed as 

y = H"Px +n, (6.124) 
where H = [hy,..., hx] € C¥** is the channel matrix, P = [p;,...,px] € 
CMXK is the precoding matrix with unit-norm columns, x = [21,...,¢K]7 
contains all the data signals, and n = [n1,...,nx«]" contains the noise. 


Under these conditions, the sum of the interfering signals at user k is 


5 hgpizi ~ Nc |0, 52 


i=1,iŻk t=1,i3k 


hzp;| | , (6.125) 


which has the same distribution as receiver noise but a different variance. By 
treating the interference as additional noise in the signal decoding, it follows 
from Corollary 2.1 that user k can achieve the downlink rate 


nofo aiee —_) 
2 tize PI [hz pil? + BNo 


The same decoding algorithm as in a single-user system can be utilized since 
DPC is not used. It is instructive to compare this rate to the uplink rate 
expression in (6.58), under the assumption that the receive combining vectors 
are selected based on the precoding vectors as wy, = pz. The uplink rate can 
then becomes 


(6.126) 


Re =C ( K ul|,,T 2 
Diaiigk Pr pkh? + BNo 
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There is a striking similarity between (6.126) and (6.127), where the only 
differences are the power coefficients and that the indices are switched in 
the interference terms so that |h{p;|? in the downlink becomes |p7h,|? in 
the uplink. This is another instance of the uplink-downlink duality but for 
systems with linear processing. By following the same approach as in the 
previous section, one can prove that any combination (R,,..., Rg) of user 
rates that is achievable in the downlink is also achievable in the uplink by 
selecting the combining vectors as wọ = p% and using the same total transmit 
power but allocating it differently between the users. The duality results 
with linear processing can be traced back to [100], [101]. Since the uplink 
powers cannot be distributed freely between the users, the duality holds 
between the downlink scenario and a virtual uplink scenario that allows 
for power reallocation between users. Hence, the downlink rate region with 
linear processing is obtained from the uplink region in (6.66) by changing the 
constraint for how uplink powers are allocated: 


K —1 
R= fn. ..., Rx): R= B logs ( + rene ( XO P™hih? + prt) 3 
i=1,ick 


K 
fork = 1,..., K, for some Bp a „pu > 0 satisfying 5 P% < ai 
k=1 
(6.128) 


The rate expression and transmit power terms in (6.128) originate from the 
corresponding uplink scenario, so how to achieve each specific rate in the 
downlink is not apparent. Before taking a closer look at that, we will compare 
(6.128) with the capacity region in Theorem 6.5, obtained with DPC. 
Figure 6.28 compares the downlink rate regions achieved with non-linear 
processing (using DPC) and linear processing. We continue the example with 
K = 2 users considered in many previous figures, such as Figure 6.26. The 
rate regions with M = 4 and M = 8 antennas are shown in Figures 6.28(a) 
and 6.28(b), respectively. The boundary points with linear processing are 
obtained from the parameterization in (6.128) by considering all combinations 
of virtual uplink powers that satisfy P?! + P}! = P. Linear processing results 
in a smaller region than non-linear processing, but the difference reduces as 
we increase the number of antennas, just as in the uplink. The loss in sum rate 
from linear processing is 4% with M = 4 but only 0.4% with M = 8, roughly 
the same as in the uplink. From a mathematical perspective, the channel 
vectors become more easily distinguishable as the number of dimensions in 
the vector space increases, which makes it possible to find precoding vectors 
that avoid causing inter-user interference without sacrificing much of the 
signal strength at the intended receiver. This is an instance of the favorable 
propagation property specified in Definition 6.2. We notice that the rate 
region obtained with linear processing is not a convex set but has a slightly 
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(a) M = 4 antennas. 
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(b) M = 8 antennas. 
Figure 6.28: Examples of downlink rate regions with K = 2 users when multi-user MIMO is 


used with either non-linear or linear processing. The region obtained in (6.128) is called “linear” 
and its convex hull is also shown. This is a continuation of the example considered in Figure 6.26. 
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curvy outer boundary. The convex hull of the region is also shown in the 
figure, and it is achieved by the time-sharing procedure described earlier in 
the chapter, where we switch between two operating points to achieve points 
on the straight line in between. The region’s size can be slightly increased 
by time-sharing when M = 4, while the benefit is unnoticeable when M = 8, 
thanks to the more favorable propagation. 

Each operating point in the rate region characterization in (6.128) is 
obtained from a corresponding virtual dual uplink scenario. LMMSE combining 
is the optimal linear receiver processing in the uplink; thus, the uplink-downlink 
duality implies that the same operating point is achieved by some kind of 
LMMSE precoding because we need p = w%. Starting from the LMMSE 
combining expression in (6.63) and normalizing it to have unit norm, we 
obtain 


(ik, hih? + Nols) he 


a WaN (6.129) 
“er hth? + Nolm ) h* 


after removing common scaling factors from the numerator and denominator. 
We can express the precoding matrix P = [p,,...,px] for all users as 


PA —1 
P = (H*QH" ai Nol) H*Z, (6.130) 


using the channel matrix H and a diagonal matrix with uplink powers divided 
ul 
by the bandwidth: Q = diag(,..., £) € CX**. The matrix Z ensures 


that each column of P has ant norm by being selected as 


1 


Z = diag 


pee 


| (H’QH" + Nom) hi 


ae T NoIm) | ‘hi 


(6.131) 
The duality also implies that the same total transmit power is e in 
uplink and downlink but usually must be allocated differently between the 
users. For the given uplink powers, Pr!,..., P#!, we can compute the resulting 
uplink SINR values 71,..., yg using (6.127). If we equate the downlink SINR 
expressions in (6.126) to the same values, we obtain the equations 


Pa hip, 2 pe K 
K k hipu = Yk 7 BN, |h; pr|”— x BN, ‘Pil’ = 
> P"htp;/? + BNo i=1,t4k 


i=1itk 
(6.132) 
for k = 1,..., K. These are K linear equations of the K downlink transmit 
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powers PH, Ses „pa, thus, we obtain the downlink powers by solving them as 
dl 
ea [ hip? fki 
> | =r} j|:], where Ca = Wen ? (6.133) 
lal Li BRE ik As 


contains all the equation coefficients and [I'],; denotes the (k,i)th entry of 
the K x K matrix I. We now have a way to map any point in the downlink 
region, which was parameterized in (6.128) based on the dual uplink powers, 
to the downlink precoding vectors and power allocation that achieves it. 


6.4.5 Alternative Linear Downlink Processing Schemes 


Despite the uplink-downlink duality, selecting preferable downlink precoding 
and power allocation can be challenging in practice. The duality holds between 
the downlink and the virtual uplink, where we are allowed to allocate power 
freely between the users. Hence, even if we design the actual uplink operation 
optimally in some sense (e.g., using the max-min fairness power control 
described in Section 6.3.6), the corresponding downlink operation obtained 
through (6.130) and (6.133) might not achieve a point on the Pareto boundary 
of the downlink rate region. It is unlikely that the specific uplink power 
division enforced by the maximum power per user in the uplink will happen 
to be optimal in the downlink. For this reason, the duality result is typically 
interpreted more loosely as the following rule of thumb [1]: the base station 
should transmit to a user in the downlink in roughly the same direction as it 
obtains a strong uplink SINR through receive combining. In other words, a 
combining vector that works well in the uplink also works well as a precoding 
vector in the downlink but might not be optimal. 

A key challenge when selecting the downlink processing is that the power 
allocation and precoding selection are intertwined in a complex way through 
the mapping between power coefficients in the downlink and the virtual 
uplink. A common approximate solution is to replace the uplink powers with 
heuristically selected coefficients in the precoding expression and then optimize 
the downlink power allocation separately. If we replace each uplink coefficient 
P}' in the LMMSE precoding vector in (6.129) by the downlink coefficient 
Pdl, we obtain 


dl =1 
(Zi, hih? + Nol) hy 
Pk = dl = | # 
| ea i hřh; + Nom) hj, 


(6.134) 


This alternative design was called the transmit Wiener filter (TWF) in [102], 
where it was motivated by minimizing the MSE between the transmitted signal 
vector and the scaled received signal vector at the users. It was also proposed 
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in [103], [104] as the precoding that maximizes the signal-to-leakage-and-noise 
ratio (SLNR) obtained by replacing the downlink interference term for a given 
user by a sum of how much interference the user leaks to other users. 
Another way of simplifying the precoding expression is to assume equal 
power allocation in the virtual uplink. By substituting Pt! = P/K into (6.130) 
and moving the power and bandwidth terms to the noise term, we obtain 


me 
pRZF = (m als <a) H*Z82F 


KBNo, \ ` 
= H* (mm + a) ae (6.135) 
where the second equality follows from the matrix identity in (2.50) and 
1 1 
ZRZP — diag TERE : 
|| A + AST) thi | || oa + AT) “th || 


(6.136) 
This is often referred to as regularized zero-forcing (RZF) because the ex- 
pression resembles that of ZF in (6.69), but the inverse of H™H* has been 
regularized by adding a scaled identity matrix. Regularization is a classical 
way to enhance numerical stability in linear algebra algorithms, but here, it de- 
termines how strong the interference is compared to the noise. By considering 
the high-SNR limit P — oo, (6.135) converges to ZF precoding 
PpR2F _, P% — H* (H™H*) t ZA. (6.137) 
ZF precoding has the property that HTP = H™H*(H™H*)~!Z2" = 272", 
so the impact of the channel matrix appears to vanish from the received signal 
expression in (6.124). However, the channel still impacts the selection of the 


matrix Z2" that normalizes the columns of the precoding matrix. We need 
one-valued diagonal entries of (P2")"P2", which can be expressed as 


(H (H"H*)* zr)" H* (H™H*)* VAJ z a (HTH*) t VAJ 
(6.138) 
We need Z™ = diag(1/V[(H"H*)-1]1,...,1/V[(H"H*)-]] xg) to make 
the diagonal entries equal to one. If we substitute the ZF precoding matrix 
into the general rate expression in (6.126), we obtain the simplified expression 


Pa 
RF = Blog, | 14 EN ; (6.139) 
px [ery], 


This expression contains no interference since ZF precoding leads to a beam- 
formed transmission that creates nulls at all the co-users. The SNR-term 
contains the factor 1/[(H7H*)~'],;, that determines how strong the remaining 
channel to user k is when the precoding has been restricted to cause no 
interference. It is no surprise that RZF turns into ZF at high SNR because 
the interference will then dominate over the noise. 
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Example 6.10. How should the transmit power be allocated to maximize the 
sum rate or achieve max-min fairness when ZF precoding is utilized? 

Power allocation optimization is relatively simple when using ZF precoding, 
thanks to the lack of interference. The sum of the rates in (6.139) is 


K dl 2 
P 
Y Blog, ( A ae) (6.140) 
No 


k= 


using the notation s? = 1/(B[(H™H*)~'],;,), which is the summation over K 
parallel user channels. It has the same form as the rate expression in (3.67) 
for a point-to-point MIMO channel, in which case the parallel channels were 
created using the SVD and the sum rate was maximized by water-filling power 
allocation. Hence, the corresponding way of maximizing (6.140) under the 
total power constraint >> a P?! < P is to use the transmit power 

palsumrate — max (u = BG [E"n] B ,0) DS ee Ga) 
where the variable p is selected to make Syf_, Pase™™"° = p, 

Max-min fairness is achieved by giving all users the same SINR value and 
maximizing that common value. The SINR in (6.139) becomes c/(BNo) for 
all users if PË = c|((H7H*)~t]kę for k = 1,..., K. This common SINR 
is maximized by making the scaling factor c as large as possible while 
complying with the sum power constraint. The maximum is achieved for 
C= PCE |(H™H"*) Ji) for which all the available power is used; thus, 
the max-min fairness power allocation is 


P H'TH* —1 
polno ain Dz [( ) lek 


DA (HTH) Ya k=1,...,K. (6.142) 


To analyze the low-SNR regime, we can return to the precoding vector 
expression in (6.134) and let P¢!,..., Pd! — 0, which leads to 


=i 
K Oh*hT + k i 
ei Shieh? Nolm ) hy h} MRT. (6.143) 


—1 * 
(SK, Shih + Nol) ng PRI 


We recognize this as the expression for MRT precoding from (3.44), which we 
recall will maximize the SNR in the absence of interference. It also maximizes 
the SINR in the multi-user setting when the interference is negligibly weak 
compared to the noise. If we substitute the MRT vector into the general rate 
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expression in (6.126), we obtain 


Pe ile |? 


RMEY = Blog, | 14 
K hTh*|2 
Lim ixk pal tha PENG 


(6.144) 


To compare the mentioned precoding schemes, Figure 6.29 shows the sum 
rate in the downlink counterpart to Figure 6.16. There are K = 4 users 
with equal channel strengths, and the SNR. value in the figure represents 
what is achieved with equal power allocation. The base station is equipped 
with a ULA with half-wavelength-spaced antennas, and the users have LOS 
channels with different azimuth angles-of-arrivals: —r /16, —7/32,0, +7 /24. 
We compare multi-user MIMO with non-linear processing (using DPC) and 
linear processing with LMMSE precoding, both based on the virtual uplink 
power allocations that maximize the sum rate. The sum rates with RZF and 
ZF using equal power allocation and the sum-rate-maximizing OMA scheme 
are also shown. The case of M = 10 antennas is considered in Figure 6.29(a) 
and reveals substantial differences between the curves. All the multi-user 
MIMO schemes have the same slope at high SNRs, demonstrating that they 
reach the same multiplexing gain of min(M, K) = K. However, there is a 
substantial gap between the non-linear and linear processing schemes, which 
showcases the benefit of removing interference using DPC. All the considered 
linear schemes perform identically at high SNRs, as expected from the fact 
that they all converge to ZF in that regime. At lower SNRs, the optimal 
LMMSE precoding is better than the simplified RZF precoding and much 
better than ZF. The OMA curve outperforms ZF at low SNRs, although it 
has a four times smaller slope as only a single user is served at a time. Hence, 
if one must choose between the simplified RZF and ZF schemes, then RZF is 
preferred since it works reasonably well at all SNRs. 

The number of antennas is increased to M = 20 in Figure 6.29(b), and 
then all the multi-user MIMO schemes provide indistinguishable performance. 
The antenna-user ratio is M/4 = 5. The same kind of behavior was observed 
in the uplink: linear processing is nearly optimal when the base station has 
around five times more antennas than the number of single-antenna users. 
This is the Massive MIMO operating regime for which 5G NR systems (in the 
mid-band) are designed by having M = 64 antennas and serving 1 < K < 16 
users, depending on the traffic load. These systems are purposely designed not 
to need complex non-linear processing, and the sizeable antenna-user ratio 
gives robustness to various practical imperfections, such as imperfect channel 
knowledge and hardware imperfections; see [1] for further details. 
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(b) M = 20 antennas. 


Figure 6.29: The downlink sum rate in a multi-user MIMO system with K = 4 users and 
either non-linear or linear processing. All the users have the same SNR if equal power allocation 
is used. The non-linear and LMMSE processing curves are obtained using sum-rate maximizing 
power allocation, while RZF and ZF use equal power allocation. OMA, where only one user is 
served, is shown as a reference and does not provide any multiplexing gain. 
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Example 6.11. Can we reduce the gap between DPC-based processing and 
LMMSE precoding without using DPC? 

Yes, one way to increase the sum rate is to use the rate-splitting technique 
[105]. The core idea is to transmit an additional data signal x, using the 
power P4! and the precoding vector pe. This signal contains a collection of 
data for everyone and is decoded by all users while treating other interfering 
signals as noise. The common signal can be encoded at the rate 


P3"|bEP cl” 
c=] min C aT : 
retk ~ \ OE, Pellnep.? + BNo 


(6.145) 


where the minimization over the user indices allows all users to decode it. The 
data contained in the common signal is divided between the users, but any 
user k can remove the entire common signal from its received signal before 
decoding x; as described earlier in this section. The sum rate is therefore 


K 

Pa ht 2 

rye = a Epal? ) a 
k=1 Vizrizk Pi |bEpil? + BNo 


and can be maximized with respect to the precoding vectors Pc, P1,---; PK 
and transmit power coefficients P, P@!,..., P4, which must satisfy the 
constraint P? + 37*_, PA < P. The term rate-splitting refers to how each 
user’s data rate is split into a “public” part contained in x, and a “private” 
part contained in x. With an informed design, communication with rate- 
splitting cannot be worse than linear precoding since that is a special case 
obtained by setting P?! = 0. On the other hand, it relies on SIC, which has 
the many practical downsides described earlier in this chapter. Moreover, it 
can only give a noticeable improvement in scenarios such as Figure 6.29(a), 
where there is a substantial gap between DPC-based processing and LMMSE 
precoding. The most attractive gains might exist in situations with limited 
channel knowledge, which are beyond the scope of this book. 


6.4.6 Power Allocation for Max-Min Fairness 


Once the linear precoding scheme is determined, the downlink transmit 
power coefficients Pf’,..., PŁ > 0 can be selected to maximize a specific 
utility function, under the constraint sy P3 < P. This is known as power 
allocation since it entails distributing the available transmit power among 
the users to achieve the desired balance among their achieved rates. In this 
section, we consider power allocation for max-min fairness. We will introduce 
the downlink counterpart to the efficient fixed-point algorithm previously given 
in Algorithm 6.1 for the uplink. Hence, we aim to find the power coefficients 
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that achieve the solution to the max-min fairness problem 


maximize Rk, (6.147) 
(Ri, RK)ER e K} 
where the downlink rate region R depends on the adopted linear precoding 
scheme. For any such scheme, it can be expressed in the generic form 


R= f(s : Ry = Blog, (1 + SINR«(Pi",...,P2)) 


fork = 1,..., K, for some PË, ae , pdl > 0 satisfying 5 på < r}. 
k=1 
(6.148) 


where the SINR for each user is a function of the transmit power coefficients 
P¢!|..., PR. When ZF precoding is used, the downlink power allocation that 
achieves max-min fairness was already derived in Example 6.10. When any 
other fixed normalized precoding vectors p1,...,pg that are independent of 
the downlink power coefficients (e.g., RZF precoding in (6.135) or MRT) are 
used, the SINR of user k can be expressed using (6.126) as 


Pe pkl? 
Za izk PS |hgp:l? + BNo’ 


SINR(P®,..., PR) = (6.149) 


where the numerators and denominators are linear functions of the downlink 
power coefficients P@!, for k = 1,..., K. 

Since maximizing the minimum rate is equivalent to maximizing the 
minimum SINR value among the users, (6.147) can be expressed for fixed 
precoding vectors as 


imi j SINR; (PË, ..., PË 6.150 
FOSS) a a e (6150) 


K 
subject to D Pll < P, 
k=1 


Algorithm 6.2 states a fixed-point iteration that finds the optimal solution. 
The algorithm starts from arbitrarily selected non-zero power coefficients 
PA! € (0, P] and sets a solution accuracy € > 0. As in the uplink counterpart 
in Algorithm 6.1, each user that achieves an SINR larger than the minimum 
SINR is assigned a reduced transmit power in Step 3. Next, in Step 4, all 
the power coefficients are scaled so that the total transmit power equals 
the maximum value of P. In fact, it can be proved that the optimal power 
allocation must use all the available power. The process continues iteratively 
until a stopping criterion is satisfied. The difference between the maximum 
and minimum SINRs among the users gradually diminishes, and the stopping 
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Algorithm 6.2 Solution to the max-min fairness problem in (6.150). 


1: Initialization: Select arbitrary Pd € (0, P], for k = 1,..., K, and the 
solution accuracy € > 0 
2 while _max, SINR; (PE, cay PA) - 


. ; dl dl 
min, SINR; (Pi E , Pi) >edo 


ic{1,..., K tE{1,..., 
o SINR: ( P... Pd) 
f dl ic{1,.., K dl = 
3: Pg < SINR, (PA, PI P, for k=1,...,K 
dl P dl = 
4: PE — sk pak fork=1,...,K 


i=1 ` i 


5: end while 
6: Output: P®,... „pal 


criterion in Step 2 determines when the difference becomes less than e. As in 
the uplink, the algorithm usually converges in fewer than ten iterations. 

The convergence to the optimal solution to the max-min fairness problem 
is guaranteed if certain technical conditions are satisfied [91, Lem. 1, Th. 1], 
which is the case when the downlink SINR has the generic form in (6.149). 

Figure 6.30 demonstrates the max-min fairness solution obtained by Al- 
gorithm 6.2 in a system with K = 4 users. The setup is the same as in 
Figure 6.29(a) and each user achieves an SNR of 10 qB if equal power alloca- 
tion is used. Figure 6.30(a) shows the variations in the rates obtained by the 
four users throughout the algorithm’s iterations when using M = 6 antennas 
and RZF precoding. Initially, there are significant rate discrepancies among 
the users because the initial equal power allocation is suboptimal. However, 
as the algorithm progresses, the rates of all four users gradually converge to a 
common value, representing the max-min fairness solution. The minimum rate 
among the users experiences gradual enhancement; however, the convergence 
behavior is not strictly monotonic because reducing the power for some users 
can improve the rates of other users after the power normalization. 

Figure 6.30(b) demonstrates the minimum rate among the K = 4 users 
as the number of antennas M increases, considering both RZF and MRT 
precoding. In addition to the max-min fairness solutions obtained by Algo- 
rithm 6.2, the minimum rates achieved by equal power allocation among the 
users are shown as references. For both RZF and MRT, employing max-min 
fair power allocation increases the minimum rate as M grows, indicating 
improved communication performance with a greater number of antennas. As 
expected, the max-min fairness power allocation yields higher minimum rates 
than equal power allocation, regardless of the precoding scheme employed. 
However, a non-monotonic trend is observed when increasing M and using 
RZF precoding with equal power allocation. This peculiarity arises since 
the power is not allocated based on the interference levels generated by the 
precoding. 
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Figure 6.30: The max-min fairness solution obtained by Algorithm 6.2 with K = 4 users, in 
the same setup as in Figure 6.29(a). All the users have the same SNR of 10 dB when using 
equal power allocation. In (a), the rates of the four users during the fixed-point iterations are 
shown when RZF precoding and M = 6 antennas are used. In (b), the minimum rate among 
the users is shown for a varying number of antennas M when using RZF and MRT precoding. 
The minimum rate obtained using equal power allocation is shown as a reference. 


484 Capacity of Multi-User MIMO Channels 


6.5 Exercises 


Exercise 6.1. The uplink rate region of a multi-user MIMO system with K = 2 is 
R= { (Fa, Ra) Pi Ro >0, Ži 4 R < 10Mbit/s}. (6.151) 


(a) Find the expression for the Pareto boundary. 
(b) Find the maximum achievable rate of the second user if Rı > 15 Mbit/s is required. 


Exercise 6.2. The Pareto boundary of the uplink rate region for a multi-user MIMO 
system with K = 2 is 


OR = { (R1, Rə) : Ri, Ro > 0, RÌ + 2R2 = 48 Mbit/s} . (6.152) 


(a) Find the max-min fairness point on the Pareto boundary. 
(b) Find the maximum sum-rate point on the Pareto boundary. 
(c) Find the point on the Pareto boundary that maximizes the weighted sum rate 


3Rı + Ro. 


Exercise 6.3. The bandwidth allocation that maximizes the uplink sum rate with FDMA 
is stated in (6.13). Derive this expression by maximizing S & Bloga (1 + spin) 
with respect to k > 0, for k =1,..., K, under the condition 1 +... +x =1. 


Exercise 6.4. Consider the uplink multi-user MIMO channel in Theorem 6.2 with M = 4 
base station antennas, K = 2 users, and B = 10 MHz. 
(a) Suppose BK; = ł, hı = [1, 1, 1, 1]7, and hə = [1, —1, 1, —1]7. Sketch the 
capacity region and explain its shape. 
(b) Suppose 5Ẹ- = 3, hi = [1, 1, 1, 1]7, and hə = [1, 1, 1, 1]7. Sketch the capacity 


BNo 
region and explain its shape. 


(c) Suppose ane = 1, hm = [1, 1, ...]7, and hı = [1, -1, 1, -1, ...]7. For which 
values of M is the sum rate greater or equal to 100 Mbit/s? 


Exercise 6.5. Consider the uplink rate region in Figure 6.10 with NOMA and K = 3. 


(a) Which user data decoding order is needed to operate at the top-left corner of the 
Pareto boundary? Is time-sharing required? 


(b) Which user data decoding order is needed to operate at the top-right corner of 
the Pareto boundary? Is time-sharing required? 


(c) How can we achieve an arbitrary point on the line between the two top corners of 
the Pareto boundary? 


Exercise 6.6. Prove that at least one user must use maximum uplink power when 
achieving the max-min fairness solution with any linear receive combining scheme that 
gives |wh,|* > 0 for all users. Hint: Use a proof-by-contradiction approach. 


Exercise 6.7. Consider an uplink multi-user MIMO system with linear processing. Show 
that the optimal receive combining is a linear combination of the channels: w, = Hw, 
for some Ww, € C* for k = 1,...,K. Hint: An arbitrary combining vector can be 
expressed as w = Hwz + Wg, where we € C™ is orthogonal to the channel vectors, 
i.e., Hw, = 0. It is sufficient to prove that picking w = 0 does not reduce the rates. 
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Exercise 6.8. Consider an uplink multi-user MIMO system with linear processing and 
K = 2. Show that the optimal combining vector wx for user k, for k = 1,2, is a linear 
combination of the MRC and ZF combining vectors: 


wk = a OwO + af we for some a} ™S af" EC. (6.153) 
Hint: Use the result from Exercise 6.7 to express the optimal receive combining vector 
as Wk = Qk, 1hı + &@k,2h2 for some values of œk 1, @%,2 E C. Show that for any ax,1, @k,2, 


one can find a} PS, aZ" € C so that we = aM PO w) MRC + aZF w holds. 


Exercise 6.9. The ZF combining matrix was defined as W” = H (H"H) in (6.69), 
which leads to the rate expression in (6.71). Alternatively, the ZF vector can be inter- 
preted as an orthogonal projection of the desired channel vector onto the null space of 
the interfering channels. By following this approach in a system with K = 2 users, the 
ZF combining vector of user 1 becomes 


ZF-alternative hə h5 ) 
WwW = Im hı ; (6. 1 54) 
! ( |[h2]| |[h2| 


which is the orthogonal projection of hi onto the null space of the other user’s channel 
hə. Show that user 1 achieves the rate in (6.71) when using wp" ®ternative, 


Exercise 6.10. Consider downlink communication to K = 2 users from an M-antenna 
base station, where M is an even number. The channels to the users are decomposed as 
hı = [h]; h] 2]" and he = [h3 ı h2 2]", respectively, where hi,1 € C™/2 and ha, € C™/ 
Sara to the channels from the first M /2 antennas of the base station to the users. 
Similarly, hi,2 € C™/2 and ho2 € C™/? are the channels from the last M/2 antennas 
to the users. Suppose the channels are orthogonal in the sense that hj',ho1 = 0 and 
hř>hə 2 = 0. Moreover, it holds that ||h1,1 ||? = ||h1,2||? = ||h2,1 l|? = ||he,2||? = M8/2, 
where § is the common channel gain. 


(a) Suppose the first user is served only by the first M/2 antennas, and the second 
user is served only by the last M/2 antennas. What are the rates of the users if 
MRT and equal power allocation are used? What is the sum rate? 


(b) Suppose all the antennas are used for serving both users with MRT precoding. 
What are the rates of the users with equal power allocation? What is the sum rate? 
Compare the results with those obtained in (a) when each antenna is assigned to 
a single user. 


Exercise 6.11. Consider a base station with a ULA with M = 4 antennas and half- 
wavelength antenna spacing. There are free-space LOS channels with zero elevation 
angle to all K users. 


(a) Suppose we transmit with MRT to a user with the channel gain 8, located in 
the azimuth angular direction 1. The transmit power is denoted by P. What is 
the power of the received interfering signal at another user, located in some other 
azimuth direction p2 and having the channel gain 82? 


(b) How should the angles yı and p2 in (a) be related to have zero interference? 


(c) Find a set of four user angles yi, p2, 93, p4 so that we can transmit to all the 
users using MRT without causing any interference. 


(d) Suppose the four user angles are all different but do not satisfy the condition 
derived in (c). Suggest a precoding matrix that removes the interference. 
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Exercise 6.12. A telecom operator divides its customers into two categories: i) standard 
and ii) premium. It promises that a premium user will always get four times higher SINR 
than any standard user. Consider the downlink of a multi-user MIMO with some arbitrary 
fixed linear precoding. Suppose that the Ky users with the indices k = 1,..., Kp are 
premium users while the remaining K — K, users are standard users with the indices 
k= Kp +1,...,K. 


(a) For an arbitrary transmit precoding scheme, design a fixed-point algorithm that 
obtains the optimal solution to the problem 


maximize SINR (6.155) 


Pat ea PAZO 
subject to SINR, > 4SINR, k=1,..., Kp, 
SINR; > SINR, k= Kp, +1,...,K, 


K 
5 PË < P. 
k=1 


(b) Suppose ZF precoding is utilized. Find a closed-form solution to the power 
allocation problem in (6.155). 


Exercise 6.13. Consider a base station with a ULA with M antennas and half-wavelength 
antenna spacing. Free-space LOS channels and K = 2 users are considered in the uplink. 
Suppose the users have equal channel gains 6; = 2 = 8 and transmit with maximum 
power: P?! = P}! = P. Moreover, assume B = 10 MHz and Bue = 1. The users are 
located in the azimuth angle directions yı = 0 and p2 = 7/8, while the elevation angles 


are Zero. 


(a) For M = 4, compute the sum rate achieved with FDMA using MRC with the 
optimal bandwidth allocation. 


(b) For M = 4, compute the sum rate achieved with multi-user MIMO based on MRC. 
Compare the result with that of FDMA from (a). 


(c) Increase the number of base station antennas to M = 8. Compute the maximum 
sum rate achieved with FDMA. 


(d) For M = 8, compute the sum rate achieved with multi-user MIMO based on MRC. 
Compare the result with that of FDMA from (c). Is the gap between FDMA and 
multi-user MIMO increasing with the number of antennas? 


Exercise 6.14. Consider uplink multi-user MIMO with fast-fading channels, linear 
processing, and perfect CSI at the receiver. 


(a) What are the ergodic rate expressions when using MRC and ZF combining? 


(b) Assume i.i.d. Rayleigh fading. Compute closed-form lower bounds on the ergodic 
rates using Jensen’s inequality from Lemma 5.1. How do the resulting expressions 
depend on M? Hint: Apply Jensen’s inequality to the convex function f(x) = 
log,(1 + 27+), x > 0. Use that E{ zt = NEIEN and E {[(H"H)]kk } = 


IUK) for i.i.d. Rayleigh fading channels [3, App. B.3]. 


(c) Simplify the lower bounds from (b) by assuming the same channel gain 6 and 
transmission with maximum power P for all users. What happens to the ratio 
of the lower bounds achieved with ZF and MRC as M — oo? What happens to 
their difference as M — co? 
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Exercise 6.15. Consider downlink multi-user MIMO with fast-fading channels, linear 
processing, and perfect CSI at the receiver. 


(a) What are the ergodic rates when using MRT and ZF precoding? 


(b) Assume i.i.d. Rayleigh fading. Compute closed-form lower bounds on the ergodic 
rates using Jensen’s inequality from Lemma 5.1. How do the resulting expres- 
sions depend on M? Hint: Apply Jensen’s inequality to the convex function 


f(x) =log,(1+27"), x > 0. Use that E {nz} = MT E {[(H"H)~"Jex } = 
hT 
ZUK: and e{ e Tez } = [Im for i.i.d. Rayleigh fading channels [3, App. B.3]. 


(c) Simplify the lower bounds from (b) by assuming the same channel gain 6 and 
equal power allocation P® = P/K among the users. What happens to the ratio 
of the lower bounds achieved with ZF combining and MRT as M — oo? What 
happens to their difference as M — o0? 


Exercise 6.16. Consider an uplink multi-user MIMO system with K = 2 users and 
block-fading channels with inputs zı[l] € C and z2[l] € C, for! =1,..., Le, where Le 
is the number of symbols transmitted in each coherence block. The two users send 
simultaneous pilot sequences that span the initial Lp = 2 symbols of each coherence 
block. For the base station to distinguish between the users’ channels, the pilot sequences 
are selected as 


— x1[1] — P 1 = x21] en P 1 
Qı a ial = B 1 , Qə a x[2] — B —1 ? (6.156) 
which are orthogonal vectors since $/'¢, = 0. During the pilot transmission phase, the 


maximum uplink power P is used by both users. The received signal at the initial two 
time instances is 


[yi] y2] = hdl + hed; + [nfl] n[2]] , (6.157) 


where n{l] ~ Nc(0, NoIm) is the independent receiver noise. During the Le — Lp = Le —2 
remaining symbols of each coherence block, the received signal is 


yli] = hızı[l] + həz2|] + nf], 1=3,...,Le, (6.158) 


where 21[l] ~ Nc(0, PP /B) and x2[I] ~ Nc(0, P3"/B). 


(a) Compute the MMSE estimates of hi, hz based on the received sen in (6.157), 
assuming that h ~ Nc(0, Gear), for k = 1,2. Hint: Multiply with TET T and ea 
from the right-hand side in (6.157) to obtain two interference-free received signals. 


You can then follow the approach from (5.137). 
(b) Suppose the base station opel MRC to the received signal in (6. 108) based on 


the estimated channels: wą = T P 


k, for k = 1,2, by treating the channel estimation error and the interference as 
noise. Hint: Use Corollary 5.2. 
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Exercise 6.17. Consider uplink multi-user MIMO with i.i.d. Rayleigh slow-fading chan- 
nels, ZF combining, and perfect CSI at the receiver. 


(a) 


(b) 


Show that the outage probability Pout,k(Rk) when the rate Rp [bit/s] is used 
for user k can be expressed involving a (M — K + 1)-dimensional vector he ~ 
Ne(0, 6xIu—x+1). This is equivalent to showing that each user experiences an 
interference-free channel with M — K +1 degrees of freedom. Hint: Use the result 
from Example 6.6 with ùk = (Af*)Fh,., 


Obtain an upper bound on the outage probability Pout,k( Rp) using the bound 
from (5.54), and find the diversity order. 


Exercise 6.18. Consider a multi-user MIMO where each user has N antennas, and 
perfect CSI is available everywhere. 


(a) 


Consider the uplink and suppose that user k uses a specific precoding matrix 
P} c C%**%, which has unit-norm columns and is known at the base station. The 
transmitted signal is generated as P}'x}! where the data vector is distributed as 
x ~ Ne(0, Q}'), for k = 1,..., K. The covariance matrix Q}! € C%** is the 
diagonal power allocation matrix with tr (QK) being the user’s total transmit 


power. The received signal at the base station is 
K 
y” = 5 H,Pwxt +n”, (6.159) 
k=1 


where Hp € CM¥XN is the channel vector from user k to the base station and 
n” ~ Nc(0, NoIm) is the independent receiver noise. What is the achievable data 
rate for user k if the interference from other users is treated as colored noise (i.e., 
linear receiver processing is used)? 


Consider the downlink and suppose that the base station uses a specific precoding 
matrix PË c C’*™ for user k, for k = 1,..., K, which has unit-norm columns 
and is known at the users. The transmitted signal is generated as SS PYRI. 
The data vector is distributed as x} ~ Nc(0, QF), where Qf c C™*M is the 
diagonal power allocation matrix with S tr( i) being the total transmit 
power. The received signal at user k is 


K 
y! = H7 5 Pox? + nf, (6.160) 
i=1 
where n¢_ ~ Nc(0, NoIw) is the independent receiver noise. What is the achievable 


data rate for user k if the interference from signals meant for other users is treated 
as colored noise (i.e., linear receiver processing is used)? 


Chapter 7 


Wideband MIMO Channels and Practical Aspects 


Practical communication systems utilize vast bandwidths to the extent that the 
channel coefficients vary over it, which might result in inter-symbol interference. 
In this chapter, we extend the previously developed MIMO theory to handle 
these situations. We will first show how multicarrier modulation appears as 
the natural transmission method when dealing with inter-symbol interference. 
We then derive the resulting multicarrier MIMO capacity and describe how 
the subcarrier channels depend on the multipath clusters. Next, we discuss 
practical hardware implementation of precoding and combining, and when 
the typical digital architecture can be simplified into an analog or hybrid 
architecture. Finally, we will exemplify two practical MIMO implementations 
and elaborate on different MIMO-related terminologies and their meanings. 


7.1 Basics of Multicarrier Modulation 


The analysis and algorithmic development in previous chapters were based 
on the discrete memoryless channel model derived in Section 2.3.4. To reach 
that model, we made the narrowband signal assumption, which essentially 
means that the time interval 1/B between two transmitted symbols is much 
larger than the delay spread, which is the variation in delay between the 
fastest and slowest propagation paths in the propagation environment. Under 
that condition, delayed copies of the previous symbols will not interfere with 
the currently transmitted symbol. One can get an intuitive sense of this 
phenomenon by listening to acoustic waves. When we hear speech or music, 
the waves will be reflected on various objects before reaching the listener. In a 
normal-sized room, the delay spread of acoustic waves is smaller than 50 ms, 
giving rise to the reverberation effect where each distinct sound becomes less 
sharp but still apprehensible and sometimes perceived as more pleasant to the 
ears. In contrast, in a large room or outdoor environment with a delay spread 
larger than 50ms, there can be distinct echoes that disturb the listening 
experience. When there are echoes of this kind, the acoustic channel is said 
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to have a memory. The same physical principles apply to radio waves, but 
the bandwidth and propagation speed are entirely different. 

Many wireless communication systems designed for broadband connectivity 
use more bandwidth than permitted under the narrowband signal assumption. 
Hence, we want to design systems that function irrespective of whether the 
environment has a long or short delay spread. To model such wideband 
channels, we return to the received signal y[l] in (2.128) at symbol time J, 
which was obtained before making the narrowband signal assumption: 


yll = $. hil- kla[k] + nfl], (7.1) 


k=—oo 


where z[k] is the transmitted symbol at time k, n[l] ~ Nc(0,No) is the 
additive receiver noise, and the communication channel is represented by the 


coefficients 
L 


hk] = So aye Pr fe(™— sine(k + B(n — Ti)). (7.2) 
i=1 

These coefficients describe L propagation paths for which path i has the 
attenuation a; and the delay 7;, while 7 is the sampling delay at the receiver. 
The important thing in this chapter will not be the exact channel model in 
(7.2) but the general structure in (7.1). The received signal y[/] contains a 
weighted summation of many transmitted symbols {z[k]}. The copy of z[k] 
received at time / is multiplied by the weight denoted by h[l — k]. 

The sinc-function appears in (7.2) because it was utilized in Section 2.3.2 as 
the pulse p(t) in the PAM transmission and for bandpass receiver filtering that 
removes noise outside the signal band. This function satisfies the transmission 
design requirements from that section while requiring the minimum bandwidth. 
However, the downside is that it has a long time duration around its peak value, 
spanning both forward and backward in time. Strictly speaking, sinc(Bt) has 
an infinite duration, but 90% of its energy is in the interval t € [-1/B,1/B] 
and 99% in the interval t € [—8/B,8/B]. We will refer to the latter as the 
effective time duration of the pulse, and the fact that it is much larger than 
the symbol time is important when characterizing the channel coefficients in 
(7.2). Recall that (7.2) is obtained in (2.126) by sampling the function 


L 
(p * g * p)(t) = X aje P sinc(B(t +n — 71) (7.3) 
=l 


at the time instance t = 4 where k is an integer. This function is illustrated 


in Figure 7.1 for L = 3 paths with amplitudes and delays specified in the 
legend. The three path components in (7.3) are shown individually, and all 
take the form of an attenuated and delayed sinc-function. The summation of 
these components results in the dotted curve that represents (p * g * p)(t). By 
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——0.7sinc(Bt) 
a = - -0.5sinc(Bt — 0.4) 
LL i 3 -----0.2sinc(Bt — 0.8) 
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Figure 7.1: Example of a channel with L = 3 distinct paths with different amplitudes and delays 
(the complex phase-shifts are neglected). The summation of these paths results in (p * g * p)(t) 
in (7.3). By sampling this signal at k/B, where k is an integer, we get the channel coefficients 
h{k] defined in (7.2). 


taking samples of this function at time instances k/B, where k is an integer, 
we get the channel coefficients h[k] in (7.2). Interestingly, these coefficients 
are non-zero for both positive and negative values of k, but the oscillations 
become smaller as |k| increases. The fact that h[k] can be non-zero for negative 
indices should not be interpreted as having an unrealistic non-causal system 
but highlights that real pulse functions start long before they reach their 
peak values. Importantly, the three paths give rise to much more than three 
non-zero channel coefficients due to the pulse’s long effective time duration. 

Practical systems mitigate this effect by using pulses with a shorter effective 
time duration than the sinc-function, represented by a faster decay around the 
peak value. However, all feasible pulses have a non-zero effective time duration, 
so this issue cannot be fully alleviated. Hence, even if the sampling delay 
is selected as 7 = min; 7; to match with the peak of the fastest propagation 
path (as was done in Figure 7.1), there will be h[k] 4 0 for negative values of 
k. To achieve a causal discrete-time system model, we should instead select n 
to take the first sample of the received signal at the beginning of the pulse 


1The pulse p(t) must satisfy the Nyquist criterion, which for a given symbol rate B requires 
that p(k/B) = 0 for non-zero integers k and results in a signal bandwidth that is larger than B. 
In theory, we could minimize the effective time duration by using a rectangle-shaped pulse that 
is only non-zero in the interval t € [—1/(2B),1/(2B)], but it will have a huge bandwidth (the 
Fourier transform is a sinc-function). A common practical choice is the so-called root-raised- 
cosine pulse, for which one can conveniently control the tradeoff between the effective time 
duration and the excess bandwidth compared to B (required by the sinc-pulse). For example, 
with 25% excess bandwidth, 99% of the energy is contained in the interval t € [—2/B, 2/B]. 
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that arrives through the fastest propagation path. The number of samples 
should be selected to take the last sample at the end of the pulse that arrives 
through the slowest propagation path. The relevant parameters are then the 
delay spread 

Tspread = Max T%- min 7; (7.4) 

i€{1,...,L} i€{1,...,L} 

of the channel and the integer number of periods Npuise for which the pulse 
takes values that cannot be approximated as zero; that is, Npuise is the smallest 
even” integer so that (p * p)(t) = sinc( Bt) ~ 0 for |t| > Nputse/(2B). Since 
Npulse and Tgpread are finite in practice, we can describe the channel using a 
finite number of channel coefficients h[{k] that we will denote as T + 1 in the 
remainder of this chapter. If we select the sampling delay as 


B Npulse —2 


i€{1,...,L} 2B ’ (r5) 


then the fastest path in (7.2), with the smallest 7;, will contain the time- 
shifted pulse sinc(k — Npulse +1), which can be approximated as zero for all 
k < 0. Since all other propagation paths are slower, we can conclude that 
h|k] ~ 0 for all k < 0 in (7.1). Moreover, the slowest path (with the largest 7;) 
will contain the time-shifted pulse sinc(k — BTspreada — Nouise +1), which can 
be approximated as zero for k > Btgpread + Npulse — 1. Hence, the channel 
coefficient with the largest time index that we need to consider in (7.1) is 
A(T] with 

T= | BTspreaa | T Npulse —l, (7.6) 


where |-| truncates its argument to the nearest smaller integer. 
In summary, when selecting the sampling delay as in (7.5), the summation 
in (7.1) will approximately end at k = l and contain T + 1 terms: 


y= X> hl- kleik] + nll 


= So hiel- 4+ nfl), (7.7) 


where the equality follows from changing the summation index from k to 
L= l— k. We notice that the channel now behaves as a causal FIR filter of 
order T with the non-zero coefficients h[0],...,h[T] as the impulse response. 
These coefficients are the discrete-time representation of the channel and can 
be computed based on the physical channel using (7.2) and (7.5). 


?The considered sinc-pulse is symmetric around its peak value in the time domain; thus, we 
should consider the same number of periods before and after the peak value. Since the fastest 
path is typically the strongest one, it is essential to take samples when the pulse received over 
that path reaches its peak value to make the corresponding path as strong as possible. 


7.1. Basics of Multicarrier Modulation 493 


The discrete-time system model in (7.7) describes a dispersive channel with 
a memory of T previous symbols; that is, the received signal y[/] contains not 
only the currently transmitted signal z|l] but also inter-symbol interference 
from x{l—1],...,a[!—T]. There are multiple ways of dealing with interference. 
We can remove the interference by “transmitting” T zero-valued symbols after 
each data symbol so that the inter-symbol interference becomes zero. This 
approach will reduce the symbol rate from B to B/(T + 1) and is more-or-less 
equivalent to the narrowband signal assumption since we effectively reduce 
the signal bandwidth to alleviate inter-symbol interference. Another option is 
to design a digital receiver filter that inverts the operation of the FIR filter 
of the channel. This is known as single-carrier transmission. In this chapter, 
we will focus on a third option: divide the bandwidth into multiple frequency 
subcarriers that each can be modeled as a memoryless channel. 


7.1.1 Orthogonal Frequency-Division Multiplexing (OFDM) 


If a narrowband signal can be transmitted over a small piece of bandwidth 
Bauarrow Without generating inter-symbol interference, then it must be possible 
to take a larger bandwidth B, divide it equally into B/Bnarrow pieces with 
bandwidth Byarrow, and transmit separate narrowband signals in each of them. 
Orthogonal frequency-division multiplexing (OF DM) is a way to implement this 
procedure without requiring a strict bandwidth division or separate hardware 
components for each piece of bandwidth. OFDM has become the standard 
digital transmission method in WiFi, LTE, NR, and many other standards. 

The main characteristic of OFDM is that the transmitted time-domain 
symbols {x[k]} in (7.7) are not equal to the data symbols, but they are instead 
designed to convey different data over different parts of the frequency band. 
To achieve this, we would like to transform the wideband channel in (7.7) 
into the frequency domain using the DFT that was defined in Section 2.8. In 
this section, we will show that this is the optimal way of operating under the 
assumption that the time-domain signal has a block-wise cyclic structure. 

Suppose we want to transmit a block of S symbols, called y(0],...,.[S—1], 
over the channel in (7.7). For any given value of T, determined by the 
propagation environment, we can always select S > T since we are the ones 
designing the communication protocol. Since the channel has a memory of T 
previous symbols, we must control what was transmitted at the previous T 
symbol times before time 0. In particular, we will append a cyclic prefix to 
obtain the following cyclic sequence of length S + T: 


ee k=0,...,5-1, 


x[k+S] k=—-T,...,-1. (7.8) 


This procedure of creating one transmission block is illustrated in Figure 7.2, 
where the complete transmitted signal consists of {a[k] : k = —T,...,S — 1}. 
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Last T symbols 


x[S—T] x[S—1] x[0] x[S-T] x[S—1] 


S data symbols 


Figure 7.2: Each block in an OFDM transmission consists of S data symbols and a cyclic 
prefix containing the last T symbols. 


Since we added the last T symbols as a prefix, we can interpret the received 
signal in (7.7) as the cyclic convolution 


T T 

yi =X airl- 4 t+ nll] = Y Ax- moas] +n, 1=0,...,9-1, 
?=0 =0 

(7.9) 

between the input signal sequence {x[s] : s = 0,...,S —1} and the sequence 


{h| : £ = 0,...,T} with the channel taps, plus the independent noise 
nfl] ~ Nc(0, No). The cyclic convolution and its properties were previously 
discussed in Section 2.8.2. Thanks to the cyclic prefix, we can write the 
relationship between the S received signals and S transmitted signals in 
matrix-vector form as 


y[0] x (0) n{[0] 
E J=) 2 Jel = |, (7.10) 
yo — 1] x[S — 1] n[S -— 1] 


where the channel is represented by the S x S circulant matrix 


hio) AS- o A LY 
nil] Alo]. aS- o AA 
Cr = : h[1] h[0] oe ; (7.11) 
hisS-] `. w ee Sa 
nIS—1] AS- ... ALL] hio] 
3In principle, we could also consider the previously received signals y[—T],..., y[—1] that 


contain a combination of the signals in the cyclic prefix and signals that were transmitted even 
earlier in time, but these received signals are normally discarded in OFDM since they contain 
interference from even earlier signals that are generally unknown. Even if these are previous 
data symbols for which estimates are available at the receiver, error propagation effects can be 
created if we rely on them for decoding the new block of symbols. 
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Y 
x — > c» | (+) y 


Figure 7.3: The operation of an OFDM system is divided into blocks of S symbols (plus a 
cyclic prefix). The transmission in a block can be expressed as a discrete memoryless MIMO 
channel with vector input x € C* and vector output y € C9. The channel matrix Cp, in (7.11) 
is circulant and the independent noise vector n is complex Gaussian distributed. 


which contains the FIR filter taps h[0],...,h[Z'] that have been padded with 
the zero-valued taps h[T + 1] =... = h[S — 1] = 0 when S — 1 > T for 
notational convenience. 

Interestingly, there is a mathematical equivalence between (7.10) and the 
system model of a point-to-point MIMO channel with S inputs, S outputs, 
and the channel matrix Ca. We can write (7.10) in the familiar MIMO-like 
form 

y=Cax+n (7.12) 


and Figure 7.3 shows the corresponding block diagram. We recall from Sec- 
tion 3.4 that the capacity of such a channel is achieved by diagonalizing 
the channel matrix, thereby creating S parallel memoryless subchannels. 
Since the channel matrix C, of an OFDM system is a circulant matrix, its 
eigendecomposition has a simple form that was derived in Section 2.8.2: 


Cn = F2D;Fs, (7.13) 


where Fg is the DFT matrix defined in (2.198) and Dj, is the diagonal matrix 


hlo) 0 0 

D; = : a : (7.14) 
: “ee SR 0 
0 .. 0 AlS—]] 


containing the frequency response of the FIR filter. It is computed as 
hiv] = 5 Ane eer. for v = 0,..., 9—1. (7.15) 
£=0 


Suppose we let the transmitter generate the time-domain signal sequence x as 
x = Fsx (7.16) 


for some data-bearing vector x € C’. If the receiver multiplies the received 
signal sequence y with the DFT matrix as F gy, it will obtain 
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¥ = Fsy = Fo(C,F2x + n) = FsF} D; FsF} x + Fon 
—— —— 
=-Is -Is 


=D,x +a, (7.17) 


which has the same form as a MIMO channel with the diagonal channel matrix 
D; and the rotated noise vector 


nfo] 
n= = Fgn ~ Nc (0, Nols). (7.18) 
n[S — 1] 


To obtain this result, we have utilized the eigendecomposition in (7.13) and the 
fact that the DFT matrix Fg is unitary. The latter property makes FsF' = Is 
and ensures that the rotated noise vector n contains independent entries with 
the same variance as n. The transmitter and receiver processing that creates 
the S parallel SISO channels is summarized in Figure 7.4(a). 

We used the eigendecomposition to diagonalize the channel matrix Ch, 
while the SVD was used for the same purpose in Section 3.4. These decom- 
positions are closely related but differ in whether the diagonal matrix is real 
or complex.* The eigendecomposition has a simpler form but only exists for 
square matrices (as in this section), while the SVD always exists. 


Example 7.1. Compute the frequency responses with S = 4 subcarriers for 

the following channels. The first channel has h1[0] = 1 but hı[4] = 0 for £ # 0. 

The second channel has h2[0] = hofl] = 1, while h2[4] = 0 for any other £. 
The frequency responses of these channels are obtained from (7.15) as 


Ay[v] = hy [Je "4 = 1, (7.19) 
hofu] = belle ere ae belle ew = j e ity /2 
= 2e7I7”/4 cos (nv /4), (7.20) 
for v = 0,...,3, where the last equality follows from Euler’s formula. 


The magnitude of the first channel’s frequency response is 1 on all subcar- 
riers, so this channel is frequency-flat. This is a consequence of only having 
a single tap. On the other hand, the magnitude of the second channel’s 
frequency response is 2|cos(mv/4)|, which results in the values |h2[0]| = 2, 
|ho[1]| = V2, |h2[2]| = 0, |he[3]| = V2 on the different subcarriers. This 
channel has frequency-varying characteristics since the two taps superimpose 
differently between the subcarriers. 


4The SVD UEV" of C, has the matrix © = diag(|A[0]|,...,|h[S —1]|) with singular values, 
which are the magnitudes of the corresponding entries of Dj, in the eigendecomposition. The 
unitary matrices can be selected as U = FED, > "1 and V = FẸ. 
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Transmitter Receiver 
processing processing 


(a) Transmitter and receiver processing that diagonalizes the OFDM channel. 


n[0] 


xls — 1] h[S — 1] (+) gS — 1 


(b) Equivalent representation with S parallel SISO channels. 


Figure 7.4: The transmission of an S-length block in an OFDM system can be represented as 
a MIMO channel where the channel matrix has the eigendecomposition Cp = F4 D};Fs. Hence, 
the transmitter and receiver can process the signals using the S x S DFT matrix as shown in 
(a) to achieve S parallel SISO channels as shown in (b). 


We have now derived the system operation generally referred to as OFDM. 
The reason for calling it orthogonal frequency-division multiplexing is that 
we multiplex the S data symbols in x using the frequency domain. More 
precisely, we generate the transmitted sequence x of time-domain symbols 
using the IDFT as x = Fx, which implies that x is the frequency-domain 
representation of the transmitted signal. Similarly, the receiver obtains the 
received signals y in the time domain and computes its DFT y = Fgy. We 
thereby obtain S parallel (orthogonal) discrete memoryless channels 


gly] = [rx] + Av], for v=0,...,9—1, 


as illustrated in Figure 7.4(b). We call these subcarriers since OFDM divides 
the wideband channel into S equally spaced subchannels in the frequency 
domain. The frequency value of a given subcarrier depends on how we measure 
frequencies. Subcarrier v utilizes the normalized frequency v/S, but since we 
use a symbol rate equal to the bandwidth B, this corresponds to the unnor- 
malized frequency vB/S in the complex baseband. Moreover, as described in 
Section 2.8.1, the DFT is a periodic function where each normalized frequency 
has aliases at v/S +n for any integer n. This ambiguity is due to the sampling 
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(b) Real passband representation. 


Figure 7.5: The S subcarriers in an OFDM system are equally spaced over the bandwidth B 
and centered around the carrier frequency, which is 0 in the complex baseband representation 
shown in (a) and fe in the real passband representation shown in (b). The subcarrier index v 
counts subcarriers from the center towards the right and then continues from left to center. 


and implies that many different continuous-time frequencies can give rise to 
the same discrete-time frequency. Since we analyze the complex baseband 
representation of a passband signal and take samples at the symbol rate, we 
know from Figure 2.9 that the actual frequencies occur in the interval from 
—B/2 to B/2. Each subcarrier v only has one alias in that range; hence, its 
true frequency is 

(7.21) 


(v-—S)B 


zB, if0<v< 4, 
wee, ifi<v<s, 


which is aligned with the symmetric range of positive and negative normalized 
frequencies shown in Figure 2.29. Figure 7.5(a) illustrates the location of 
the subcarriers along the frequency axis and which subcarrier index v gives 
rise to each of them. The figure considers the case when S' is odd, while the 
outermost frequency values change slightly when S is even. We can multiply 
(2.207) by B to obtain a list of all the subcarrier frequencies in the complex 


baseband: 
Bš] p -Bg B1sl-8 
S 3 Fl S g? t S 
_B B_B ia 
= F 2 5 S> : l 5 u even, (7.22) 
a tag tee If S is odd. 


The separation B/S between two adjacent subcarrier frequencies is called 
the subcarrier spacing. The theory for OFDM is developed in the complex 
baseband, but the physical communications occur in a passband centered 


7.1. Basics of Multicarrier Modulation 499 


around some carrier frequency fe. We can obtain the corresponding subcar- 
rier frequencies by shifting the entire spectrum to be centered around that 
frequency. The resulting real passband representation is illustrated in Fig- 
ure 7.5(b), which shows the positive subcarrier frequencies around + fe (there 
is also a copy around — fc, as illustrated in Figure 2.9). 

When transmitting a large amount of data, the OFDM system operation 
is divided into many consecutive blocks, each managed as described above. 
Each block is called an OFDM symbol. The structure of an OFDM symbol is 
illustrated in Figure 7.6, which shows both the time- and frequency-domain 
representations. Since we transmit T + S time-domain symbols with a symbol 
rate of B Hz, the total time duration of an OFDM symbol is (T + S)/B 
seconds. The OFDM symbol spans the entire bandwidth B, as shown in 
Figure 7.6(a). Since each time-domain symbol has a duration of 1/B seconds 
and a bandwidth of B Hz, it covers an area of ZB = 1 in the time-frequency 
plane. This unit area is dimensionless but is sometimes called one complex 
degree of freedom because it represents the minimum component from which 
time-frequency signals can be created. Just as any molecule is made of a group 
of atoms, any communication signal is made from a group of complex degrees 
of freedom. 

Figure 7.6(b) shows how x[0],..., X[S— 1] represent the transmitted signals 
over S subcarriers. The subcarriers are equally spaced over the frequency 
domain, each utilizing a bandwidth of B/S Hz. Since an OFDM symbol has a 
time duration of (T + S)/B seconds, each subcarrier covers an area of 


— a =14 degrees of freedom (7.23) 


in the time-frequency plane. This is larger than the unit area of a time-domain 
symbol because each OFDM symbol consists of a sequence of T + S time- 
domain symbols, of which T symbols are sacrificed in the cyclic prefix to 
remove inter-symbol interference. However, if we select S >> T, then 1+ T wl 
so that the loss is small in relative terms. 

The complete transmitter and receiver implementations of OFDM are 
illustrated in Figures 7.7(a) and (b), respectively. The transmitter first encodes 
data into the S symbols x = [x(0],...,[S — 1]]*. It then computes the IDFT 
to obtain x = [x[0],...,x[S — 1]]" = Fx. The transmitter then appends 
the cyclic prefix to obtain a sequence x[S — T],...,x[S — 1], x[0],..-,x[S — 
1] of T + S time-domain symbols, which are transmitted serially over the 
communication channel. The receiver stores a sequence of T + S time-domain 
symbols y[—T],...,y[S — 1] but discards the cyclic prefix to obtain y = 
[y[O],...,y[S — 1]]*. It then computes the frequency-domain signals y = 
[y[0],...,y[S — 1]]7 =F sy using the DFT. 

The IDFT x = Fx at the transmitter and DFT y = F gy at the receiver 
are obtained as matrix-vector multiplications. The multiplication between an 
S x S matrix and an S-length vector generally requires the computation of 
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T symbols S symbols 
E 


Time 


(a) Time-domain representation of an OFDM symbol. 


Frequency 


S subcarriers 


Time 


(b) Frequency-domain representation of an OFDM symbol. 


Figure 7.6: OFDM systems divide the transmission into blocks called OFDM symbols, which 
span the entire bandwidth B and T + S time-domain symbols. This block is utilized to generate 
S memoryless subcarrier channels in the frequency domain. 


7.1. Basics of Multicarrier Modulation 501 


Discard the 
cyclic prefix 


xlo] y[0] 


Serial to 


parallel 


x[S—]] y[S—]] 


(a) Transmitter. (b) Receiver. 


Figure 7.7: Block diagrams of the transmitter and receiver in an OFDM system. 


S? multiplications and S(S — 1) additions, but the DFT matrix has a special 
structure with repeated entries that can be utilized to lower the computational 
complexity. In particular, there is a classical algorithm called the fast Fourier 
transform [106] that computes the DFT or IDFT using a number of arithmetic 
operations proportional to S logs($) instead of $?. This fast implementation 
is typically used in practical systems. 


Example 7.2. The OFDM symbols in 4G LTE and 5G NR use the subcarrier 
spacing B/S = 15 kHz, irrespectively of the bandwidth; thus, the block length 
S grows proportionally to the bandwidth B. The cyclic prefix is selected to 
have the time duration 4.69 us, which specifies a particular largest admissible 
delay spread and corresponds to T ~ B- 4.69 . 107°. How many complex 
degrees of freedom does each subcarrier utilize? 

We can compute the complex degrees of freedom directly using (7.23) as 


T B 
a u = 1 + 4.69 - 107° - 15 - 103 ~ 1.07. (7.24) 


This indicates that the cyclic prefix increases the utilization of signal resources 
by 7%, which is the price to pay for dealing with inter-symbol interference. 

There are alternative OFDM configurations in 5G NR [107], including 
an extended cyclic prefix option that can be selected to manage larger delay 
spreads and increased subcarrier spacings (by a factor of 2” for n € {1,2,3,4}) 
to handle latency-critical services and small-cell deployments with low delay 
spread. In the latter case, the cyclic prefix is shortened accordingly to maintain 
the same number of degrees of freedom per subcarrier. 

There are other multicarrier modulation schemes than OFDM, and some 
alleviate the cyclic prefix to increase the resource efficiency; however, this 
can only increase the capacity by 7%. A more important reason to avoid 
OFDM is that the IDFT operation creates a time-domain signal with relatively 
large power variations, which makes it hard to build efficient power amplifiers. 
Hence, some low-power communication systems use other modulation schemes. 


502 Wideband MIMO Channels and Practical Aspects 


7.1.2 Capacity of SISO-OFDM Channels 


We will now determine the channel capacity of the OFDM system in (7.17), 
which we will refer to as the SISO-OFDM channel because we have a single- 
antenna transmitter and a single-antenna receiver. By using DFT matrices 
for transmitter and receiver processing, as illustrated in Figure 7.4, we create 
the S memoryless subcarrier channels 


gly] = hjul] + afr], for v=0,...,9—1. (7.25) 
Suppose we use the symbol power q, when sending the data symbol y[v] at 


subcarrier v; that is, E{|y[v]|?} = qu. We can then utilize Corollary 2.1 to 
conclude that the resulting data rate at subcarrier v is 


why]? l . 
logs N. bit per subcarrier symbol. (7.26) 
0 


This rate is achieved when the data symbol is distributed as x[v] ~ Nc(0, qv). 
The accumulated data rate within one OFDM symbol is the summation of 
(7.26) for all S subcarriers: 


= qv|h{v) |? 
S logs | 1+ = —— K bit per OFDM symbol. (7.27) 
v=0 0 


Since each OFDM symbol has a time duration of (T + S)/B seconds, we can 
equivalently express (7.27) as 


B S qu|hly)|? ; 


This expression is almost the bandwidth B multiplied by the average rate 
of the S subcarriers, but we are dividing by T + S instead of S, which is 
the price to pay for the cyclic prefix. We have referred to (7.26)—(7.28) as 
data rates, not the capacities, because we initially assumed arbitrary symbol 
powers qo,..-,¢s—1 on the subcarriers. Since the channel capacity is the 
maximum data rate, it can be obtained by maximizing (7.27) with respect 
to all permissible ways of selecting these power parameters. We used q in 
previous chapters to denote the maximum symbol power in the time domain. 
The corresponding requirement in the OFDM case is that E{|x[s]|?} < q for 
the ome domain symbols, for s = 0,...,5 — 1. We can utilize the definition 

xls] = zs SSi Xvje2"8"/S of the IDFT to connect this requirement to the 


data symbols y(0],...,x[S — 1] that are transmitted in the frequency domain: 


E{Ixis]?} =E {Ks jePrer/s | 
= e{erjerr’s)’h = 1Sa (729) 
v=0 
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where we utilized the fact that the data symbols are independent and have 
zero mean when achieving the aforementioned rates. The conclusion is that 
we can select the (non-negative) symbol powers qo, ...,qs—1 at the different 
subcarriers freely under the power constraint 


1 1 
J Sasa (7.30) 
v=0 


This constraint says that the average power over the subcarriers should equal 
the power per time-domain symbol. Another way to phrase it is that the 
sum power of the S subcarriers should be smaller or equal to the power of S 
time-domain symbols: yo qu < qS. The capacity of the OFDM channel (in 
bit per OFDM symbol) is therefore obtained by maximizing the sum rate of 
S memoryless channels under a sum power constraint: 


C = max 2 logs (: + wih vl J. (7.31) 


ae 120: 
Ss— 


ve qs 


Apart from a somewhat different notation, this is precisely what we did when 
considering the point-to-point MIMO capacity in Theorem 3.1. The optimal 
solution was obtained by the water-filling power allocation: 


N, 
opt _ 0 = = 
qP" = max (1 nl we) , v=0,...,S—-1, (7.32) 


where the variable u is selected to make = qv = QS. 

When we previously utilized water-filling to achieve the MIMO channel 
capacity, we divided the power between different spatial dimensions. It is 
common that a few spatial dimensions are much stronger than the other 
dimensions since more power reaches the receiver when transmitting towards 
some specific multipath clusters. This can result in only allocating power to a 
subset of the subchannels, particularly at low SNRs or when considering LOS 
channels. In contrast, the S subcarrier channels h[0],...,h[S — 1] are often 
of similar strength because they are all created as linear combinations of the 
same channel taps, as can be seen from (7.15). Hence, except in very low SNR 
scenarios, we will allocate power to all subcarriers. When that happens, we 
can utilize (3.72) to identify the optimal value of u in (7.32): 


182 ny 


S m lhl? 


u=q+ (7.33) 


The channel capacity of a SISO-OFDM system can be summarized as follows. 
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Theorem 7.1. Consider the SISO-OFDM system in Figure 7.3 with input 
x € CS and output y € CÎ given by 


y= Cax +n, (7.34) 


where n ~ Nc(0, Nols) is independent noise. Suppose the input distribution 
is feasible whenever the symbol power satisfies E{||x||?} < qS. The channel 
matrix Cp has the eigenvalues h[0],...,h[S — 1] given by (7.15) and the 
corresponding eigenvectors are columns of the IDFT matrix Få. If the channel 
matrix is constant and known at the input and output, the channel capacity 
is 


Sail = 
B a |h]? : 
= ——— l 14+ #4 bit 7.35 
C T5% o ( + Mo it/s, (7.35) 
where T is the length of the cyclic prefix, 
No 
qo?t = max Gat), y=0,...,9—1, (7.36) 
( hfn]? 


and the variable u is selected to make es gov! = qs. 


The capacity is achieved by the input distribution x ~ Nc(0, FZQ°"'F's), 
where Q®t = diag(q9”", ...,qg",) is an S x S diagonal matrix. 


In summary, OFDM is the capacity-achieving way to communicate over 
wideband SISO channels under the assumption that a cyclic prefix is appended 
to the data transmission. We observed this by rewriting the transmission of 
a block of S symbols into a MIMO-like matrix form and showing that the 
resulting channel matrix C, is diagonalized by DFT and IDFT operations at 
the receiver and transmitter, respectively. We then obtain S' parallel subcarrier 
channels, similar to Section 3.4, and achieve the capacity by dividing the 
power between them using water-filling. 


7.2 Capacity of MIMO-OFDM Channels 


We will now extend the capacity analysis from the last section to cover OFDM 
systems with multiple antennas at both the transmitter and the receiver. When 
there are M receive antennas, each of the received signals can be modeled 
using an FIR channel filter as in (7.7), but with the essential difference that 
signals are received simultaneously from K transmit antennas. Hence, the 
received signal on antenna m at time l can be expressed as 


Ymll] = X X hmle- 4 + nml], (7.37) 


£=0 k=1 
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where hm,~[0],..., Am, |T] are the channel coefficients between receive antenna 
m and transmit antenna k, 2;[I] is the transmitted signal from antenna k 
at time l, and n,,,[1] ~ Nc(0, No) is the independent receiver noise. This is a 
messy system model because the received signal at a given time instance l 
depends on the signals transmitted from K antennas at T + 1 time instances. 
However, we can resolve this inter-symbol interference as in the SISO case. 

Suppose a T-length cyclic prefix is applied in accordance to (7.8), then the 
collection of received signals at antenna m in a block containing S time-domain 
symbols can be expressed as 


Ym [0] K Xx [0] Nm [0] 
; =X Chaa ; + i l (7.38) 
Yml S — 1] k=1 Xk[S — 1] Nm|S — 1] 
eS a a 
=Ym =Xk =Nm 


where x, € C is the signal sequence transmitted from antenna k, nm ~ 
Nc(0, Nols) is the receiver noise, and the channel between receive antenna 
m and transmit antenna k is represented by the S x S circulant matrix 


Amz [0] Amr [S — 1] oA hm kl] hmr] 
hm.[1] hml) “Taal =I) ces Pm, [2] 
Chins = hm,x[1] hm,x[0] l 
hm, [S — 2] z i hm, x[S — 1] 
hm kl S— 1] Tiel -2 pe hm kll] Pm, x0] 
(7.39) 


This matrix has the same shape as (7.11), which implies that its eigenvectors 
also coincide with the columns of the IDFT matrix F4 in (2.198). In particular, 
the eigendecomposition of Cy, , is 


= FSD; 


m,k 


Cr Fs, (7.40) 


mk mk 


where the diagonal matrix Dj , = diag(hm,x[0],-..,Am,x[S — 1]) contains 
the frequency response coefficients of the FIR filter that describes the channel: 


T 
hm elt] => hm ple, v=0,...,8 1. (7.41) 
L=0 


This implies that we can diagonalize the matrix Ca, „ by considering signals 
transmitted and received in the frequency domain instead of the time domain. 
If we express the DFT of the transmitted signal at antenna k as x, = Fsx,, 
we can write the DFT Ym = F sym of the received signal in (7.38) as 


Ym [0] K K 
Ym = ; = Fs (>: Chm r SXk + an = 5 Di.,,. Xk + Dm, 
Um|S — 1] k=1 
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where we utilized (7.40) and denoted the noise vector in the frequency domain 
as m = [Nm[0],.--,2m[S — 1]? = Fsnm ~ Nc(O, Nols). We notice that 
the vth entry in Ym is independent of the other entries in the sense of only 
depending on variables with the same index. Hence, we can separately describe 
the signals received over the M receive antennas at subcarrier v as 


Yı [v] hı ale] e. hi xl] Xi[V] Ny [v] 
a el = f+]: |, (7.43) 
yu] humal] .-- hux[rll LXi rulv| 
-— OOS - Orr—r 
=y[v] =H[v] =x([v] =n] 


which we can write in short form as 
yi] = Aly x[v] + aly], v=0,...,S9—-1. (7.44) 


This looks precisely like a MIMO channel of the kind considered in Section 3.4, 
but it is based on the frequency-domain channel matrix H|] € C“** that 
is a weighted sum of the channel matrices at the different channel taps: 


T 
Ajy] = X Bg Pe", v=0,...,S—1, (7.45) 
£=0 


where H[/] e C”** is the time-domain channel matrix at the tap with index 
l, whose (m,k)th entry is hm, |4]. The matrices H[0],...,H[S — 1] is the 
frequency response of the considered MIMO channel. 

Thanks to the cyclic prefix, we managed to rewrite the system model in 
(7.37) for one S-length block with inter-symbol interference into the S separate 
subcarrier channels in (7.44). This is the model of a MIMO-OFDM system 
and is summarized in Figure 7.8. The subcarriers are mutually independent 
in the sense of depending on different signal vectors |v] and independent 
noise terms ñ[v]. The only thing that couples them is the power budget of the 
transmitter: the total energy per block is limited to qS. This power constraint 
can be expressed in different ways: 


K K S-1 
DEL Ixell?} = Do Ell? = X Eix} < aS. (7.46) 
k=1 k=1 v=0 


The first and second summations consider the time-domain and frequency- 
domain signals at the K antennas, respectively. The third summation considers 
the frequency-domain signals at the S subcarriers. It showcases that the power 
limit applies to the average sum of the symbol powers ||x[v]||? per subcarrier. 
We know from Theorem 3.1 that the capacity of a point-to-point MIMO 
channel is achieved by transmitting in the right singular vector directions and 
applying water-filling power allocation. The same can be done in the OFDM 
case, with the only exception that water-filling is carried out by considering 
all subcarriers and their respective spatial dimensions. 
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n[0] 


x[S — 1] H[S — 1] (+) yis- 1] 


Figure 7.8: A MIMO-OFDM system can be represented as S parallel MIMO channels. 


Theorem 7.2. Consider the point-to-point MIMO-OFDM system in Figure 7.8, 
where subcarrier v has the input x¥[v] € C* and output y[v] € C™ given by 


yl] Steel v=0,...,9-—1, (7.47) 


where n{v] ~ Nc(0, NoIm) is independent noise. Suppose the input distribu- 
tion is feasible whenever 2, E{||x[v]||?} < qS. The channel matrices H[v] 
are constant and known at the transmitter and receiver. Let the r, non-zero 


singular values of H{v] be denoted as s,,1,...,5y,r,- The channel capacity is 
S-lr opt 2 
B - Qk Sv,k A 
C= =—— lo 1+ —_— bit/s, 7.48 
res Sy Soa (1 R) bis aa) 


where T is the length of the cyclic prefix, 


Ni 
if = max (u 340). v=0,..., 5-1, k=1,...,7, (7.49) 
Suk 


and the variable p is selected to make X324 7", ie = go. 
The capacity is achieved by the input distribution x[v] ~ Nc(0, V,QoP'V®), 
where QoPt = diag (qr, ..-, Qh" ,0,...,0) is a K x K diagonal matrix and 


V, contains the ordered right singular vectors of H[}. 


If we would instead use an arbitrary precoding matrix P, and diagonal 
power allocation matrix Q, on subcarrier v, the resulting achievable rate can 
be expressed similarly to (3.106) as 


B S L HEJH y 
=r 2. log, (aet (tu + p PMPA PEA )) bit/s. (7.50) 
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(a) Transmitter. (b) Receiver. 


Figure 7.9: Block diagrams of the main components between the baseband unit and the 
antennas when using the digital beamforming architecture with K = M = 4 antennas. 


7.2.1 Digital Beamforming Architecture 


The theory and algorithms in this and previous chapters were developed using 
discrete-time complex baseband signals and channel models. There is a direct 
mapping between these models and the real continuous-time passband models 
used for practical communications, which was described in Section 2.3.1. In 
practice, this transformation is done by a sequence of hardware components 
at the transmitter and receiver. Figure 7.9 exemplifies the digital beamforming 
architecture, where each antenna has a dedicated chain of components between 
itself and the baseband processor [108]. This versatile architecture is capable 
of implementing all the features considered in this book. 

At the transmitter, the discrete-time OFDM signal sequence is generated 
in the baseband unit (BBU) and then converted to an analog baseband signal 
using a digital-to-analog converter (DAC). The signal is then up-converted to 
the passband through multiplication with a sinusoidal carrier frequency signal 
generated by a local oscillator. The passband signal is then fed to a power 
amplifier (PA) that greatly increases the power before the signal reaches the 
antenna, which radiates it as an electromagnetic wave. Each antenna has a 
dedicated branch in Figure 7.9(a) with a DAC, up-converter, and PA. 

The receiver performs similar processing but in the opposite order. The 
receive antenna converts the incoming wave into an electric current that is 
typically very weak and, therefore, fed to a low-noise amplifier (LNA) for 
immediate amplification. Next, the signal is down-converted to the baseband 
through multiplication with a sinusoidal carrier frequency signal and lowpass 
filtering. Finally, the signal is sampled using an analog-to-digital converter 
(ADC), and the output signal sequence reaches the BBU. Each antenna has a 
dedicated branch in Figure 7.9(b) with an LNA, down-converter, and ADC. 

This is a high-level description of the digital beamforming architecture, 
which highlights the essential processing blocks. In practice, there are also 
bandpass filters next to the amplifiers to reject out-of-band distortion. Some- 
times, the down-conversion is done in two stages: from the carrier to an 
intermediate frequency, where the bandpass filtering is done more conve- 
niently and then converted to the baseband. The number of components of 
each kind is directly proportional to the number of antennas. 
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7.3 Clustered Multipath Propagation and Hybrid Beamforming 


The clustered rich multipath propagation model was introduced in Sec- 
tion 5.6.1 to determine the MIMO channel matrix in an environment with 
Na clusters that scatter signals from the transmitter to the receiver. Cluster 
i € {1,..., Na} is located in the direction (y;,;, ,;) seen from the transmitter 
and in the direction (y,,;,r,;) seen from the receiver. In this section, we will 
extend this model to the OFDM case and explore what kind of hardware 
implementation is necessary to achieve the MIMO capacity. 

We consider a ULA with K antennas at the transmitter and a ULA 
with M antennas at the receiver. The array response vectors are denoted as 
ax(v,0) € CË and ay(y, 0) € C™, respectively, and can be modeled as in 
(4.120). Each multipath cluster contains a large number of paths with varying 
delays, which are spread out so that the cluster might contribute to multiple 
channel taps. Hence, the channel matrix H|] € C“** at tap £ is modeled as 


Nei 
HQ = X` clan (Yri bridak (ti, hai) &=0,...,T7, (7.51) 


i=1 


which is a generalization of (5.187) where the new channel coefficient c¢;[¢] ~ 
Nc(0, G;{]) depends on the tap index. The shape of the sequence (3;[0],... , 3;[T] 
of variances is known as the power-delay profile and is characterized by the 
cluster arrival time (i.e., the first tap index with a non-zero variance) and 
how the power decays with time as the propagation distance increases. The 
channel model in (7.51) is commonly considered in the analysis of mmWave 
channels [109], which typically contain fewer clusters than in the low-band and 
mid-band because of the greater penetration losses and negligible diffraction 
in those bands. We refer to [110], [111] for further motivations of this model, 
which is also appropriate for sub-THz bands [13]. 


Example 7.3. In the Saleh-Valenzuela model from [112], the power-delay 
profile is determined by the power-delay coefficients IT > 0 and y > 0 for the 
clusters and individual paths, respectively. Each cluster į is associated with a 
discrete arrival time t;, and the variances of the respective channel coefficients 
are given by 


0, eE MOn m 
ilé] = 7.52 
ae eee ih Oe ie ee ma 


The factor Sye—’/! describes how all channel gain coefficients decay expo- 
nentially with time since the waves spread out, while the factor e~¢—*)/7 
determines how much weaker the slower paths in a cluster are compared to 
the quickest path. This model was originally proposed for SISO channels but 
is commonly used with the clustered MIMO channel model in (7.51). 
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If we substitute the time-domain channel matrices in (7.51) into (7.45), 
we obtain the channel matrices at each of the S OFDM subcarriers: 


Nei T 
H[v] = 5 (>: alge) am (r,i; Oy i AF (Pt,is Ori), V = 0, TE S— 1. 


£=0 
(7.53) 

Each matrix is a weighted sum of all clusters, with the weights being the S- 

length DFT of the sequence c;[0], . . . , ci[T] of time-domain channel coefficients. 


The weights vary with the subcarrier index, which creates frequency-dependent 
fading variations depending on whether the channel coefficients superimpose 
constructively or destructively. However, the large-scale geometric channel 
properties, such as the number of clusters, their angular directions, and 
average strength, are the same for all subcarriers. These large-scale properties 
primarily determine the rank of the channel matrix. For example, the rank of 
each matrix in (7.53) is upper bounded by min(M, K, Na), which is equal to 
Na when the clusters have well-separated angles from both the transmitter’s 
and receiver’s perspective (and min(M, K) > Na); see Figure 5.34(c) for an 
example. In situations where the number of clusters is small compared to 
the number of antennas so that the channel matrix has at most rank Na, a 
simplified hardware architecture is sufficient to achieve the channel capacity. 
These practical aspects will be the focus of the remainder of this chapter. 


Example 7.4. Suppose the channel coefficients are distributed as c;[¢] ~ 
Nc(0, 8:[2]) and independent across the clusters i and tap indices £. What is 
the average squared Frobenius norm of the channel matrix in (7.53)? How 
does it depend on the subcarrier index? 


The average squared Frobenius norm at subcarrier v € {0,..., S — 1} is 
Na / T 2 
E{ |e = = 2 (dea cill at) MK 
Ne 0) 
Na T Na T 
=MK> OSE {lalg?} =MK > Y AA, (7.54) 
i=1 £=0 i=1 0=0 


which is independent of the subcarrier index. The channel realizations will be 
different between subcarriers so that some are stronger than others momen- 
tarily, but all subcarriers are equally good statistically. 

In practice, the channel coefficients of different clusters are independent 
since they involve different physical paths. However, the channel taps of 
a given cluster can be slightly correlated since the pulse functions have a 
non-zero effective time duration, so each physical path affects multiple taps. 


7.3. Clustered Multipath Propagation and Hybrid Beamforming 511 


7.3.1 One Dominant Cluster: Analog Beamforming is Sufficient 


There are propagation scenarios where one of the clusters is significantly 
stronger than the others, for example, because it provides specular reflection 
while all other clusters provide diffuse scattering. Moreover, if there is a LOS 
path, it is typically much stronger than the scattered paths and can be modeled 
similarly to a cluster but with a deterministic c;[¢] and a short power-delay 
profile. In this section, we assume that i = 1 is the dominant cluster. The 
general subcarrier channel matrix in (7.53) can then be approximated as 


T 
H[v] ~ (>: alerts am (¢r,1,9r,1)aK(%t,1,%,1), v=0,...,S-1. 
(=0 


(7.55) 
This is an approximately rank-one matrix with VMK| Xo c1 [e775] 
being the only non-zero singular value. This value varies with the subcarrier 
index, v, but the eigenvectors remain the same. At every subcarrier, it is 
optimal for the transmitter to apply MRT with pi = a% (Yt,1, 0s,1)/VK and 
for the receiver to use MRC with wi = am(¢r,1, 4r,1)/ VM. The resulting 
effective SISO channel on subcarrier v is 


T 
bs aie e's lam (pri r,1) ||? lax (e,1, %,1) ||? 
£=0 


2 


wi H{v]p; JVM JK 


(> eter, VMK. (7.56) 


Thanks to MRT and MRC, the amplitude of each channel tap is increased 
by a factor VM K; thus, the maximum beamforming gain of MK is achieved 
in this setup. Since we only transmit one signal per subcarrier and use 
the same precoding/combining vectors on all subcarriers, we can implement 
the transmitter and receiver using a simpler architecture than the digital 
beamforming architecture illustrated in Figure 7.9. 

The simplified analog beamforming architecture is shown in Figure 7.10. 
The transmitter only generates one OFDM signal sequence in the BBU and 
uses a DAC and up-converter to transform it into an analog passband signal 
centered at the carrier frequency. This signal is then divided into K branches, 
one per antenna. Each branch contains a phase shifter (PS) and a PA, which is 
sufficient to implement the multiplication with the phase-shift and amplitude 
of one of the entries in pı. This is called analog beamforming (or a phased 
array) because the beamforming operation is implemented in the analog 
part of the transmitter, in contrast to the digital baseband as in the digital 
beamforming architecture. There are multiple ways to implement PSs. If 


5The addition of a LOS path to the clustered multipath model was previously considered 
in Example 5.18. Although there is only a single LOS path, it can contribute to multiple taps 
since the sinc-pulse has a long time duration. 


512 Wideband MIMO Channels and Practical Aspects 


Power 
divider 


combiner 


(a) Transmitter. (b) Receiver. 


Figure 7.10: Block diagrams of the main components between the baseband unit and the 
antennas when using the analog beamforming architecture with K = M = 4 antennas. The 
phase shifters at the transmitter are used to implement a precoding vector common to all 
subcarriers, while the phase shifters at the receiver implement a combining vector common to 
all subcarriers. 


a few predefined phase-shift values are sufficient to choose between (which 
restricts the selection of p1), then the circuit can contain transmission lines of 
different lengths, each causing a propagation delay that matches one of those 
phase-shifts. This kind of digital PS circuit controls the phase by switching 
between which transmission line the signal propagates through. There are also 
analog PS circuits that can control the phase continuously, for example, by 
controlling a voltage that determines which phase-shift the circuit imposes on 
the signal. Each PS causes a relative power loss of a few dBs when shifting the 
phase, called an insertion loss. There are similar losses in the power divider. 
Hence, to minimize the total power dissipation in the transmitter (and the 
need for cooling), the signals are not amplified until right before the antennas, 
which is why the PAs are placed after the PSs in Figure 7.10. In principle, 
the PAs could operate at different powers, but this feature is not needed in 
the LOS and single-cluster scenarios that analog beamforming is meant for. 


The receiver carries out nearly the same operations but in the opposite 
order. The real passband signal received at a specific antenna is amplified by 
an LNA and then phase-shifted using a PS unit. The M phase-shifted received 
signals are then added to obtain a combined signal that is down-converted 
and sampled by an ADC. The LNA is placed before the PS since the received 
signals to the antennas are typically much weaker than the minimum input 
power that a PS can handle. Since the received signal power can vary by many 
tens of dB depending on the propagation conditions, the amplification level in 
the LNA must be dynamically adjusted to maintain an almost constant output 
power. This feature is called automatic gain control and is implemented as a 
feedback loop between the amplifier output and the regulating circuit. This is 
done in both analog and digital architectures. 


The analog beamforming architecture is tailored to propagation scenar- 
ios with a single dominant cluster (or LOS path). We can thereby reduce 
the number of converters (i.e, DAC, ADC, up/down converters) since all 
antennas share these, but it comes at the expense of adding PSs and power 
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dividers/combiners. We can obtain beamforming gains and spatial diversity 
gains in this way but no multiplexing gains. If the channel contains more than 
one strong cluster, then the achievable beamforming gain is less than what 
can be reached using digital beamforming, and the achievable rate can be 
much below the capacity since spatial multiplexing cannot be used. 


Figure 7.11 exemplifies the effective channel gain that can be achieved on 
different subcarriers by beamforming in different angular directions. There 
are K = 5 transmit antennas and S = 200 subcarriers. The channel gains 
are normalized so that the maximum is 0dB. There is Na = 1 cluster in 
Figure 7.11(a), and it spans one channel tap and is seen from the azimuth 
angle 0°. The channel gain is the same on all subcarriers, and the beam 
pattern with its main beam and side-lobes is seen over the angles. This is a 
situation where analog beamforming is capacity-achieving. 


Figure 7.11(b) considers a case with Na = 3 clusters, which are located in 
the angular directions 0°, 25°, —35°. The clusters have equally strong channel 
gains but different time delays, so they appear in three different channel 
taps. Hence, the clusters interact to create channel variations between the 
subcarriers, known as frequency-selective fading. There is no angular direction 
that simultaneously maximizes the channel gain on all subcarriers, but we 
will have to vary the precoding over the subcarriers and utilize MRT. Analog 
beamforming cannot achieve the maximum beamforming gain in this case 
and can also not utilize the three clusters for spatial multiplexing. The next 
section will determine a simplified hardware architecture tailored to the case 
with min(M, K) > Na > 1. 

The precoding and combining are applied to the time-domain passband sig- 
nals when using analog beamforming, while it is done in the frequency-domain 
in digital beamforming. The equivalence between these implementations can 
be established mathematically. We begin by taking the IDFT of the MIMO- 
OFDM received signal in (7.44), which is given at time instance | as 


1 S-1 
ts = = 27v /S 
v= Fg 9 
1 S-1 1 S—1 
= H v|xlv e2Tlv/S + aly cl2anly/s 
Ja Hels + Fe a 


=n{I] 


T 
= X H[fxil- Hmoas] + nl], 1=0,...,5-1, (7.57) 
L=0 


where we used the cyclic convolution theorem from Lemma 2.15. By inserting 
the time-domain channel H[4] from (7.51) with Na = 1 cluster into (7.57), 
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(a) One multipath cluster appearing in one tap and seen in the direction 0°. 
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(b) Three multipath clusters appearing in different taps and seen in the directions 0°, 25°, —35°. 


Figure 7.11: The effective channel gain (i.e., the squared norm of the inner product between 
the channel and precoding vector) can vary with the subcarrier index and beam angle (assuming 
that the precoding vector is an array response vector). The variations’ size depends on the 
number of multipath clusters and their respective time delays. There are K = 5 antennas and 
S = 200 subcarriers, but different numbers of clusters in (a) and (b). 
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we obtain 
T 
yi] = So ca ffam (pri, Or) ae (p1, 01) Xi Qmoas] + nf]. (7-58) 


At every time instance l € {0,..., S — 1}, we maximize the received power 
by applying the precoding vector pı = a%(¥41,4.1)/WK and combining 
vector Wy = am (Pr,1, 9r,1)/ vM. By writing the transmitted signal with time- 
domain precoding as x|] = pıx[l], and applying receive combining to y[I] as 
yll] = wiy[l], we obtain the equivalent SISO-OFDM system 


yl] = So lV ME Xl — )moas] + nll (7.59) 


where n{l] = win{/] is the noise. Every tap is scaled by a factor VMK, 
precisely as in (7.56) that was derived based on frequency-domain precod- 
ing /combining. ae the frequency-domain representation of the effective 
SISO channel in (7.59) is the same as in (7.56). We stress that time-domain 
precoding/combining leads to using the same precoding/combining vectors 
on all subcarriers, so the equivalence only holds when this is our goal (i.e., 
when having one dominant cluster).° 


7.3.2 A Few Dominant Clusters: Hybrid Beamforming is Sufficient 


A more general scenario where the hardware architecture can also be simplified 
is when the number of clusters Na is any number smaller than min(M, K). 
In this case, the rank of the channel matrices in (7.53) equals Na (or could 
possibly be even smaller), and this is the maximum number of parallel data 
streams that need to be transmitted and received per subcarrier. We can 
express the channel matrix on subcarrier v as 


Hv] = A,D[vjA? (7.60) 
by using the matrix notation 
A, = [am($r,1,9r,1) --- am(¢$r, is Na)] € CHAN (7.61) 
T 
Df] = diag (>: ci (dle de : Sonal ae) € CNax Na. (7.62) 
£=0 
Ay = [ak(1,1,91) --- Mie T ech, (7.63) 


The Na x Na diagonal matrix D[v] varies between the subcarriers. In contrast, 
the matrices A+, A, with array response vectors remain the same since these 


6In principle, any linear frequency-domain precoding/combining that is constant over an 
OFDM symbol can alternatively be implemented using time-domain filters, but it generally 
requires more complex impulse responses than what can be implemented using PSs. 
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describe the cluster geometry. When Nea < min( M, K), these matrices expand 
the channel dimension from Na x Na in Div] to M x K in Hju]. 

If the transmitter uses a precoding vector that is not within the span of 
A; (ie., it cannot be written as a linear combination of its columns), then the 
unspanned component cannot reach the receiver. This would be a waste of 
signal power; thus, any transmit precoding matrix P[v] of practical interest 
on subcarrier v can be expressed 


P[v] = A:Pppl, (7.64) 


where Ppp[v] € CX@*N« is the subcarrier-unique part with a dimension that 
matches with the channel’s rank. The subscript indicates that this part of the 
precoding matrix must be generated in the baseband (BB) before the IDFT 
is used to generate the time-domain OFDM signal sequence. 

Similarly, if the receiver uses a receive combining vector w that is not 
within the span of A,, it will try to extract signals from dimensions in C™ 
where the channel matrix can never place any signal components. Hence, any 
combining matrix W|v] of practical interest on subcarrier v can be expressed 


Wy] = A, Waal), (7.65) 


where Wgp[v] e CNax Na is the subcarrier-unique part of reduced dimension. 

Suppose we generate the transmitted signal on subcarrier v as x[v] = 
Piv|x[v], where x[v] € CN is the data signal. By applying the combining 
matrix in (7.65) to the received signal in (7.47), we obtain 


W"[vly[v] = Wish] ATHAt Pasly]xty] + WaplvlAraly|. (7.66) 
eS eH! ES 
Analog domain Effective noise 


The pre-processing by A; at the transmitter and post-processing by A, can 
be implemented in the analog domain in the transceiver hardware since these 
matrices are common to all subcarriers. Since these matrices contain array 
response vectors, each entry represents a phase-shift that can be implemented 
using a PS, as in the previous case of analog beamforming. 

Figure 7.12 illustrates a possible hardware architecture for the case of 
M = K = 4 and Na = 2. The transmitter generates Na = 2 OFDM signals 
in the BBU, each representing one of the entries of Pgp[v]x[v] € C? for 
y=0,...,S5 —1. Each OFDM signal is transformed into an analog passband 
signal using a DAC and up-converter, and then multiplied with the respective 
columns of A, € C**? by using the upper and lower collection of PSs, 
respectively. Each collection contains K = 4 PSs. The phase-shifted signals 
are then sent to the respective antennas, where they are added up before 
being amplified and radiated. 

The opposite procedure is carried out at the receiver side, where the 
received signal at any given antenna is first amplified by an LNA. The signal is 
then divided into two parts that are sent to different sets of PSs, representing 
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(a) Transmitter. 


(b) Receiver. 


Figure 7.12: Block diagrams of the main components between the baseband unit and the 
antennas when using the hybrid beamforming architecture with K = M = 4 antennas. The 
power dividers and combiners are illustrated with circles as in Figure 7.10, but not labeled with 
text to avoid clutter. 


the different rows of A¥. The signals are then added up within each branch, 
down-converted, and sampled to obtain two output signals in the BBU. The 
effective noise in (7.66) has the covariance matrix 


NoWpplvVJAPA,;Wepplv, (7.67) 
which should preferably equal NoIy,, so the noise is white. This condition is 
satisfied by combining matrices of the kind Wgp[v] = (A™A,)~!/?Upp{v, 
where Upp[v] € Cx Na can be any unitary matrix. 

We have described an instance of the hybrid analog-digital beamforming 
architecture [72], [113]. The name indicates that the precoding and combining 
operations are divided between the analog and digital domains. One could view 
this as a generalized architecture since Ne; = 1 results in analog beamforming 
and Na = M = K is equivalent to digital beamforming. However, the reality is 
more complicated because Nea is a variable that changes with the propagation 
environment where the transmitter/receiver is utilized while the hardware 
architecture must remain fixed. Hence, it is more suitable to decouple these 
variables and let Npr denote the number of radio-frequency (RF) signals 
generated in the transmitter’s BBU and sampled at the receiver; that is, the 
number of DACs/ADCs and up/down converters. The hybrid architecture is 
sufficient to achieve the MIMO capacity if Na < Nrr. The following table 
summarizes the number of hardware components needed in the transmitter 
(or receiver by replacing K with M): 


Component Digital | Hybrid | Analog 
Converters K NRF 1 
Phase shifters 0 Ngr K K 
Power amplifiers K K K 


The choice between these architectures requires making tradeoffs. The 
number of converters (i.e., ADC/DAC, up/down) can be reduced by going 
from a digital to a hybrid or analog architecture, but at the expense of 
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requiring PSs and being unable to achieve the capacity when Na > Nrer. 
Digital beamforming is predominant in systems operating in the low-band 
and mid-band, while analog/hybrid beamforming is common in the high- 
band. A general trend seems to be that the frequency range for which digital 
architectures are practically feasible gradually increases. However, this does 
not change the fact that some propagation environments (e.g., LOS-dominant 
scenarios) do not require the extra capabilities the digital architecture provides. 

When using an arbitrary value of Nar, we can denote the hybrid precoding 
matrix on subcarrier v as P|v] = PrrPpply|, where Prr € C**®F is the 
analog part and the subcarrier-specific digital part is Pgp[v] € CN®®* RF. The 
latter part can be further factorized as Ppp[v] = (P2pPrr)~!/?Vpp[v] where 
Veplv] e CXrr* re is the effective precoding matrix that the transmitter 
can freely select because it has the same power as P{[v]: 


[Pile = Veplellle, (7.68) 


where || - || denotes the Frobenius norm defined in (5.87). 

Similarly, the combining matrix is denoted as W[v] = Wrr' W gg[v], where 
Wrr € CM Nre is the analog part and Wgp[v] = (WEpWrr) Uggi] € 
CNrr*Nrr is the digital part with Ugg[v] being a unitary matrix. 

Using this notation, the received signal in (7.66) can be reformulated as 


W" (ye) = Use MAL Veel] + WE Wiral], (7.69) 
i 


~Nc(0,NoIngp) 


where the data signal is x[v] ~ Nc(0, Q[v]) and Q{v] is the diagonal matrix 


with power coefficients. The effective channel matrix is denoted by H[v] € 
C™rrxNer and defined as 


v 


H[v] = (WrWrr) V WipH[|Pre(PirPre) |”. (7.70) 


It follows from (3.106) that an achievable rate (in bit per subcarrier symbol) 
is 


log, (det (Inge + FĂ VesbiQMVis i" H)) 71) 


This rate can be maximized by selecting Vgg[v] as the right singular vectors 
of H[v] and Q[v] according to water-filling power allocation. 

The expression in (7.70) demonstrates how the analog parts of the precod- 
ing and combining matrices transform the M x K channel matrix H[v] into 
an effective channel matrix H[v] with the reduced dimensions Nar x Npr. 
This limits the maximum multiplexing gain to Nep and reduces the maximum 
achievable rate if the original channel matrix had a higher rank. 
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@ 
(a) Multipath clusters. (b) Digital beamforming. (c) Hybrid beamforming. 


Figure 7.13: A sparse multipath propagation scenario with M = K = 5 antennas and Na = 5 
clusters distributed in angle to match perfectly with the beamspace representation, as illustrated 
in (a). The transmitter is to the right, the receiver to the left, and the dotted lines show the 
boundaries between the angular intervals considered in the beamspace representation. The 
digital beamforming architecture achieves the MIMO capacity by transmitting/receiving one 
beam per cluster as in (b). The hybrid beamforming architecture can only transmit as many 
beams as there are RF inputs/outputs, which is Nap = 2 in (c). 


Example 7.5. Consider a hybrid ULA architecture with M = K antennas 
and Ner RF inputs/outputs. What rate can be achieved over the channel 
in (7.47) if each of Ay and A, contains Na > Npr columns from the scaled 
DFT matrix VMF m, and c,[é] = /8/Na if l = n and c,[¢ = 0 otherwise? 

The columns of A; are orthogonal, and the same holds for A,. Hence, we 
can only use Near < Na clusters. Since all clusters are equally strong, we can 
select Prr as the first Near columns of A; and Wre as the first Nar columns 
of A, without loss of optimality. As these columns originate from the DFT 
matrix, the channel can be represented in the beamspace, and the hybrid 
transmitter/receiver will point beams directly toward Npr of the Na clusters, 
as illustrated in Figure 7.13(c) for Ngr = 2 and Na = 5. The effective channel 
matrix in (7.70) simplifies to 


F- VMK diag ( g Pea Ot tra (TTA) 


since PE Prr = WipWre = MIy,, and WEp A, = PE pA; = (MInpp, 0]. 
All Neg singular values of H[v] equals \/8/N4M, which turns water-filling 
into equal power allocation of q/Ngrr and Vpp{v]Q[v] Viei] = ¢/Nrevingy- 
Hence, the rate in (7.71) becomes 

48 MK 


NpFl 1 = <== — bit bearri bol. 7.73 
RF 10g ( T No Na =) it per subcarrier symbo ( ) 
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Figure 7.14: The capacity per OFDM subcarrier symbol achieved in a setup with M = K =5 
antennas and either Na = 2 or Na = 5 multipath clusters. The digital beamforming architecture 


is compared with the hybrid architecture with Nap = 2. The architectures achieve different 
capacities when Ne > NRF. 


The simplified rate expression in (7.73) for hybrid beamforming showcases 
the essential limitations of this architecture. The multiplexing gain is Npr, even 
if the channel supports a larger multiplexing gain Na. The full beamforming 
gain M K is achieved but divided by Na since the total channel gain £ is equally 
distributed between that many clusters. It is also divided by Npr since the 
total transmit power is divided into that many pieces when performing spatial 
multiplexing. The channel capacity in the considered setup is Ne logs(1 + 
SNRM K/N2), where SNR = ¢8/No, and can be achieved using the digital 
beamforming architecture as shown in Figure 7.13(b). The main difference is 
the multiplexing gain that is different if Na > NRF. 

Figure 7.14 compares the rates achieved by the digital and hybrid archi- 
tectures with M = K = 5. We consider the same setup as in Example 7.5 
with Nar = 2 and Na € {2,5}. The digital and hybrid architectures provide 
exactly the same rate when Na = Npr = 2. By contrast, there is a large gap 
at high SNRs when Na = 5 because the hybrid architecture achieves a multi- 
plexing gain of 2 instead of 5. This result confirms that hybrid beamforming is 
only a suitable alternative in propagation environments with a small number 
of clusters, not more than NRF. 

The selection of the analog precoding/combining matrices Prr, Wrr 
is easy when the clusters are located in orthogonal angular directions, as 
in Example 7.5 where they match with the DFT beam directions. This 
situation is unlikely to arise in practice, which makes the selection more 
challenging. For example, we might have Na > Ner, but the clusters are 
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unevenly distributed over the angles so that we can capture most of the signal 
power using Npr carefully selected beam directions. There is a vast literature 
on this topic [114], where the main principle is that we want to retain as much 
as possible of the average channel gain over the subcarriers when applying 
analog precoding/combining. Specifically, it holds that 


1 — 1 _ 
5 do | WirHly|Prellp < 9 APNE (7.74) 
v=0 v=0 


since we lose some channel dimensions when using hybrid beamforming. 
Intuitively, we want to make the gap between the two expressions in (7.74) 
small. Hence, we want to maximize 


S-1 

1 
pe ||WerH[v JPre|lz = = Duet "[v|WrrWeirHly ] PrF) (7.75) 
p0 eee nee) 


ZHE [v]H[y] 


xt (Px (33 > wR 1m) Par). (7.76) 


where the approximation is motivated by ae a receiver that can capture 
all the signal power in the Npg-dimensional subspace where the transmitted 
signals exist. Based on this approximation, it follows that the analog precoding 
matrix Prr should be selected based on the average channel matrix expression 
se, H"|v}H[v] € C***. More precisely, it should use the Nar strongest 
dimensions of this matrix, which are spanned by the eigenvectors associated 
with its Npr largest eigenvalues. Since these eigenvectors generally have 
entries with varying magnitudes that cannot be implemented using PSs, a 
transformation step is required; we refer to [115] for the precise details. When 
Pre has been selected, one can further argue that Wrer should be selected 
to contain the eigenvectors corresponding to the Npr strongest eigenvalues 
of t DSi Alv|ParP%,H"(v], but this can also only be done approximately. 
When the analog precoding /combining matrices have been selected, the digital 
precoding /combining matrices are computed separately on each subcarrier 


based on the SVD of H[v] in (7.70), and the water-filling power allocation is 
finally performed over all the subcarriers. 


7.3.3 Beam-Squint Effect 


In the clustered multipath propagation model, the channel taps in (7.51) 
depend on the array response vectors of the ULAs at the transmitter and 
receiver. The array response vector expression was initially derived in Sec- 
tion 4.2.1 under two assumptions: far-field propagation and frequency flatness. 
The latter condition can be invalidated when the bandwidth B is very large 
because the array response expression depends on the wavelength, and it varies 
with the frequency. This can lead to issues when using analog beamforming. 
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To analyze this phenomenon in detail, we revisit the array response ex- 
pression in (4.120). We let A. = c/f. denote the wavelength at the carrier 
frequency and assume an antenna spacing of A = \./2. The array response 
vector for a signal with frequency f that arrives from the azimuth angle y 
and elevation angle 6 then becomes 


1 1 
j27 (c/2) 2i coste) eiT sin(y) cos(@) 
am(ọ,0, f) = = 
: jam Ee) ince) cosl) e`iT(M 1) £ sin(y) cos(0) 


(7.77) 
This is the same expression as in (4.120) when f = fe. When considering 
OFDM, we are interested in frequencies that can be expressed as f = fe + rg 
for subcarrier indices in the range v € [—S/2,S/2], where S is the number of 
subcarriers. By substituting this into (7.77), we obtain 


1 
me oir 2° sin(y) cos(0) 
am (00t) = 
e-im(M—1) 225 sin(y) c0s(0) 
1 
=a (h, 7 ama) eit cos(4) l 
e-in(M—1) sin(p) cos(8) 
(7.78) 
where b[v] = ei" 57. Sale) co8(9) The last term is the conventional array 


response vector for a half-wavelength-spaced ULA, while the diagonal matrix 
shifts the phases of the entries depending on the subcarrier index. 

The frequency-dependent array response affects the subcarrier channels 
in an OFDM system. For example, the M x K MIMO channel matrix on 
subcarrier v in (7.53) must be revised as 


= Ey gee ; vB vB 
H[v] = 5 (> Ci gers) am Ce Oii fe+ F Jak (ors Oris fe+ z) , 
i=l \€=0 
(7.79) 


where the two array response vectors now depend on the subcarrier index. 
We recall that the beamwidth depends on the aperture length compared to 
the wavelength. Since the physical aperture length is constant in a practical 
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array, the relative length varies over the signal bandwidth, so the beamwidth 
shrinks or grows. This is a minor issue when using the digital beamforming 
architecture because we can then adapt the precoding/combining to the 
channel conditions on each subcarrier. The frequency dependence is more 
problematic when using the analog beamforming architecture because it can 
lead to the beam-squint effect. To showcase the phenomenon, we consider a 
SIMO channel with Na = 1 cluster where the channel vector on subcarrier v 


is f " ps 
h[v] = ps gees) am (v, 0, fet =) (7.80) 
£=0 5 

Suppose the receiver is built using the analog beamforming architecture 
and applies the MRC vector w = ay;(¥,0, fe)/VM designed for the carrier 
frequency fe. The effective channel on subcarrier v becomes 


T : j= 
wh vl = c e7}2T/S ai (Y; 0, f.)diag(b°[v], .. i oe ‘vam (9,9, fe) 
Bl (>: au = 
T I M 
= (> T Ti Soom"), (7.81) 
L=0 m=l1 


where we utilized the expression in (7.78). The term in parenthesis is obtained 
also in the single-antenna case, while the rest is due to having multiple antennas. 
Hence, the beamforming gain on subcarrier v is Itz Mb" [v] |? and 
becomes M if b[v] = 1, as we normally expect when using MRC. This property 
holds at the center subcarrier with v = 0 or when the transmitter is located 


in a direction where sin(y) cos(@) = 0 (e.g., p = 0). In other cases, we get 


2 


Ly L |) SR pin (m—1) $2 sin(p) cos(0) 
5 bt] ] SE > eit m—1) 3 sin(y) cos 
y M m=1 M m=1 


1 sin? M %7 Bane) cosl) 
7 M eae (7.82) 


2 


where the last equality follows in the same way as the beamwidth calculation 
in (4.52). This is generally a decreasing function of the magnitude |v| of the 
subcarrier index but can sometimes oscillate. The function also depends on the 
number of antennas and ratio B/f. of how large the bandwidth is compared 
to the carrier frequency. 

Figure 7.15 shows the beamforming gain in (7.82) at different subcarriers 
with indices v € [—S/2, 5/2], where S is the total number of subcarriers. We 
consider a setup with M = 20 antennas, B/f, = 0.1, and transmitters located 
in the azimuth plane in three directions: p € {0,7/6, 7/3}. The beamforming 
gain is the same on all subcarriers if y = 0, but for all other directions, 
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Figure 7.15: The beamforming gain in (7.82) achieved at different subcarriers in a setup with 
M = 20, B/fe = 0.1, and 0 = 0. 


the gain is reduced as |v| increases. The reason is that MRC is supposed to 
compensate for the phase differences between adjacent antennas, and these 
vary substantially between subcarriers when the bandwidth is large compared 
to the carrier frequency. The figure shows that there can be several dB of 
gain losses at the edge of the band. The results shown in this figure could 
appear in the mid-band if fe = 3 GHz and B = 300 MHz, or in the mmWave 
band if fe = 30 GHz and B = 3 GHz. Those specific bandwidth values might 
be larger than what practical systems use; it is more typical to consider a 
third of it so that B/fe = 0.1/3. In that case, the gain losses observed in 
the middle third of the figure (i.e., v € [—S/6,.5/6]) should be anticipated in 
analog beamforming systems. 

The noun “squint” is used to describe a mismatch in the directions that a 
person’s eyes are pointing. A similar directional mismatch causes beam-squint. 
Figure 7.16 shows the beamforming gain 


2 


1 B 
qz [am (#0, fe) am (713,0, f: + =) (7.83) 


obtained using MRC vectors with different observation angles y when the true 
signal arrives from the angle 7/3. We still consider M = 20 and B/f, = 0.1. 
The beam pattern has its peak at the correct angle y = 7/3 at the center 
frequency (v = 0), but when we consider subcarriers further from the center, 
the pattern is shifted outwards; that is, the beam is not pointing in the 
direction we expect it to do. With analog beamforming, we would use y = 7/3 
on all subcarriers, leading to the gain losses observed previously since the 
actual beam direction changes. 
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Figure 7.16: The beamforming gain in (7.83) obtained using MRC vectors with different angles 
y. The pattern is shown for different subcarrier indices when the signal arrives from the azimuth 
angle 7/3, M = 20, and B/fe = 0.1. 


The beam-squint effect is present in the analog beamforming architecture 
and limits how large bandwidths can be used effectively. When a signal 
arrives from the angle y, the propagation delay dm/c at receive antenna m is 
frequency-independent and only depends on the propagation distance dm and 
speed of light c. However, the phase-shift 27d,,/X is not since the wavelength 
depends on the frequency. Conventional PSs nevertheless assign the same 
phase-shift to the entire signal band, giving rise to the described beam-squint. 
An implementation solution is to replace the PSs with more complex true 
time delay (TTD) units that assign the same delay to all frequencies; this 
alleviates the beam-squint effect when the signal is transmitted/received in a 
single direction. However, it does not address the general limitation of analog 
beamforming when it comes to multipath propagation. 


7.4 Practical Implementations and Terminology 


MIMO communication technology has existed for decades, but in the 5G era, 
it switched from being an optional high-end feature to becoming mainstream. 
It is utilized in both mid-band and mmWave deployments, at both base 
stations and user devices. In this section, we will take a look at two specific 
implementations, highlight some practical design characteristics, and shed 
light on a few ambiguities that exist in academic and industrial terminology. 

Figure 7.17 shows a mmWave transmitter designed for the 28 GHz band. 
It consists of 16 single-polarized antenna elements, arranged on a 4 x 4 square 
grid. The horizontal and vertical element spacings are \/2 ~ 5.3mm, so this 
is a critically spaced array. Four RF inputs are visible at the bottom of the 
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Figure 7.17: The photo shows the antenna array in the TMYTEK Developer Kit from 2022, 
which is designed for the 28 GHz band. It consists of 16 antenna elements, which are arranged 
into subarrays. Each column is a subarray that shares an RF input and, therefore, always 
transmits the same signal. Hence, from a MIMO communication perspective, this is a horizontal 
ULA with directive antennas, and it uses an analog beamforming architecture. 


photo, and these are connected to the antenna elements so that elements in 
the same column are always sending the same signal. The set of elements that 
share the same RF input is called a subarray in the industry; however, from 
the perspective of this book, each column corresponds to a single antenna. 
Hence, this is a horizontal ULA with half-wavelength spacing, but with 
directive antennas implemented using subarrays consisting of a few elements. 
Each individual element has a 3dBi directivity gain while each antenna 
has a 3 + 10log,)(4) = 9dBi gain. The extra 6dB comes from the fixed 
“beamforming” gain obtained when feeding four elements with the same signal. 
The consequence of this design is that the array has a limited beamwidth 
both horizontally and vertically, but it can only control the beamforming in 
the horizontal plane. This is sufficient when the transmitter and prospective 
receivers are located in roughly the same plane (e.g., a person carrying a 
device in a room or along a street). The four RF inputs are connected to 
individual PSs located behind the antenna arrays; thus, this device uses the 
analog beamforming architecture. 

Figure 7.18 shows a base station array for the 3.5 GHz band, and it has 
32 RF input/output signals, which is referred to as 32T32R. This kind of 
product is marketed as “Massive MIMO”. If we look inside the box, it contains 
64 dual-polarized antenna elements arranged on a 8 x 8 grid, so the total 
number of elements is 128. These elements are arranged into subarrays, each 
consisting of four vertically stacked elements having the same polarization. 
Hence, using the terminology of this book, we are considering a UPA with 8 
dual-polarized antennas per row and 2 dual-polarized antennas per column. 
Each dual-polarized antenna uses +45° polarizations, which are illustrated 
using red and blue colors in the figure. This subarray arrangement gives the 
maximum beamforming capability in the horizontal plane for the given 8 x 8 
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Figure 7.18: The photo shows the Ericsson AIR 3268 base station from 2021. It is designed 
for the 3.5 GHz band and has 32 antennas (16 dual-polarized antennas) and RF/BBU hardware 
integrated into the box, following the digital architecture. Each antenna is designed as a subarray 
with four vertically stacked elements having the same polarization. The array is dual-polarized 
and each element location contains two elements with orthogonal polarization (+45°). This 
product supports a bandwidth of 200 MHz, a total transmit power of 200 W, and passive cooling. 


element grid. However, it has a limited ability to change the vertical beam 
directivity. A product of this kind is meant for deployments in geographical 
areas with low-rise buildings, where the base station sees all the users and 
multipath clusters from roughly the same elevation angle, so there is no need for 
drastically changing the vertical beam directivity. There are other base station 
products with the same number of elements but more RF inputs/outputs, each 
connected to PAs/LNAs, DACs/ADCs, filters, etc. These products are thicker, 
heavier, and more expensive, but are capable of spatial multiplexing of users 
on different floors in high-rise buildings. This particular product weighs 12 kg 
and is implemented using the digital architecture, and all the components 
are integrated into a box with the dimensions 0.5 x 0.7m. It is clear that the 
word “massive” refers to the number of antennas, not the weight or size. This 
base station array is rectangular, although the dual-polarized elements are 
deployed on an 8 x 8 grid. The reason is that the horizontal element spacing is 
A/2, while the vertical element spacing is 0.7A. The latter is a sparsely spaced 
array configuration that reduces the vertical beamwidth at the expense of 
occasionally creating grating lobes, but these point into the sky, where they 
cause no interference to users on the ground. 
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Figure 7.19: Multiple antennas have been gradually integrated into cellular technology. Fixed 
beams are used in 1G-3G, while 4G uses horizontal dual-polarized ULAs and 5G uses dual- 
polarized UPAs. Subarrays of the kind illustrated in Figure 7.18 are used in all three cases. 


7.4.1 Evolution of Cellular Technology 


The base station technology has thus far evolved in three main steps toward 
integrating MIMO technology. A traditional base station is illustrated in Fig- 
ure 7.19(a) and has a fixed radiation pattern, which is broad in the horizontal 
plane but relatively narrow vertically. The base station can thereby aim signals 
toward the ground and cover a 120° sector where the intended users reside. 
This radiation pattern is typically achieved using a single subarray with mul- 
tiple vertically stacked antenna elements, resulting in fixed beamforming. The 
base station might have dual-polarized antennas, which enables polarization 
diversity. 1G-3G technology featured such base stations. 

Figure 7.19(b) illustrates how basic MIMO features were enabled in 4G by 
deploying two traditional dual-polarized antennas next to each other horizon- 
tally. The horizontal beamwidth can then be halved compared to Figure 7.19(a) 
and the beamforming gain doubled. The typical array configuration is a dual- 
polarized horizontal ULA where each antenna consists of a subarray with 
multiple vertically stacked antenna elements. The directivity can only be 
adapted in the horizontal plane, which is called 2D beamforming. The 4G 
standard supports basic spatial multiplexing and diversity features. 

Figure 7.19(c) shows a typical 5G base station configuration with a UPA 
that enables both horizontal and vertical beamforming, so-called 3D beam- 
forming. The illustrated configuration is the same as in Figure 7.18, which 
contains subarrays because many telecom operators want the beamforming 
gain provided by having many antenna elements but save costs by reducing the 
number of RF components. The 5G MIMO implementation is called Massive 
MIMO and supports beamforming, diversity, and spatial multiplexing. One 
reason that 5G can support many more antennas than in the past is that 
all the components in the digital beamforming architecture in Figure 7.9, 
except the BBU, nowadays can be integrated into a single box. In previous 
generations, each chain required separate boxes, which made MIMO bulky 
and heavy. 5G base stations for mmWave frequencies can be similar to the 
4G example, except that each subarray is an analog beamforming array. 
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7.4.2 MIMO-Related Terminology 


The history of multiple antenna communications spans more than a century, 
and several ambiguities in the terminology have appeared along the way. One 
reason is that different people prefer different terms for roughly the same 
concepts. Another reason is that the MIMO functionalities and use cases 
have expanded with time, which raises the question of whether one should 
generalize existing terms to cover these changes or make up new terms. We 
have selected a particular terminology in this book and tried to define its 
meaning rigorously, but in this section, we will describe additional terms and 
briefly explain what different meanings they might have. 

Antenna port: The mapping between physical antenna elements and 
what we call “antennas” in the baseband processing can be rather complicated 
and implementation-specific, as exemplified by the subarrays in Figures 7.17 
and 7.18. Therefore, 3GPP uses the term antenna port to refer to what is 
perceived as an antenna in the BBU; in other words, a typical MIMO channel 
in this book has K antenna ports at the transmitter and M antenna ports at 
the receiver. How these “logical” antennas are mapped to physical antenna 
elements needs not to be standardized, as long as we have a mechanism to 
obtain the corresponding channel matrix H. It is even possible for a practical 
MIMO system to vary its number of antenna ports with time, by changing 
how large groups of elements constitute a subarray with a common logical 
antenna port. The traffic and device capabilities might trigger these changes. 

Beamsteering: This refers to the mechanism of varying the angular 
direction of the beam transmitted from an antenna array. This can be achieved 
using either the analog, hybrid, or digital architecture. The term is usually used 
when the beam direction is changed over time to cover different geographical 
regions, but without aiming the beam at a known user location. This feature 
is used for broadcasting common messages over different parts of the coverage 
area (as discussed in Section 4.3.3) or for scanning an area in radar applications. 

Beamforming, precoding, combining: This refers to the tuning of 
amplitudes and phases in antenna arrays to achieve directional signal transmis- 
sion and reception that maximize communication performance. Beamforming 
was originally considered in LOS scenarios, where the optimal design creates 
beams that point in specific angular directions leading to the intended re- 
ceivers. When the concept is applied in NLOS scenarios, MRT instead results 
in sending a signal with no apparent angular directivity. These two cases 
are illustrated in Figure 7.20. Some people use the beamforming term also 
in NLOS scenarios, while others prefer to call it generalized beamforming to 
highlight that the transmission has an entirely different physical shape than in 
LOS scenarios. In this book, we avoid using the beamforming term in NLOS 
scenarios to limit the risk of confusion. Instead, we have used the generalized 
terms transmit precoding and receive combining. The SNR gain that is ob- 
tained when focusing signals using multiple antennas is called beamforming 
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Figure 7.20: The SNR-maximizing transmission takes a different physical shape in LOS and 
NLOS scenarios because it is based on the channel. Some people use “beamforming” to refer 
to both cases, while some people use the terms precoding or generalized beamforming in the 
NLOS case. 


gain, array gain, aperture gain, or power gain. 

Multi-stream or multi-user beamforming: When spatial multiplexing 
is used in point-to-point MIMO scenarios, each signal is transmitted and 
received using a different “beam”, which might not point in specific angular 
directions. This is sometimes called multi-stream or multi-layer beamforming, 
to extend the classical beamforming terminology further. In this book, we have 
instead used the precoding and combining terms, and we let the power alloca- 
tion (e.g., water-filling) determine how many parallel signals are transmitted 
and received. The signals can be called streams or layers. Similarly, some 
people refer to the transmission and reception in multi-user MIMO scenarios 
as multi-user beamforming. Furthermore, it happens that the term precoding 
is viewed as a combination of beamforming (selection of the signal direction) 
and power allocation (distribution of power between different beams). 

Full-dimensional and three-dimensional beamforming: This refers 
to beamforming using UPAs or other array geometries that can control the 
beam directivity both horizontally and vertically. The industry introduced 
the term to highlight this new feature in their product lines because the first 
multiple antenna features in 3G and 4G systems used horizontal ULAs only 
capable of beamforming in the two-dimensional horizontal plane. 

Block-level and symbol-level precoding: The previous chapters de- 
scribed block-level precoding, where a fixed set of precoding vectors is used 
for a block of data symbols. Specifically, the transmitted signal was expressed 
as par p;x;, where the value of the symbol x; changes at every time in- 
stance based on the user data, while the precoding vector p; only depends 
on the channels and is fixed for as long as the channels are. This structure 
is capacity-achieving when an infinitely large block of data is transmitted, 
but other options can be considered when transmitting a finite data block 
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in practice. In symbol-level precoding [116], the precoding vectors change 
at every time instance, based on which data symbols will be transmitted. 
Instead of sending signals through fixed beams, the transmission is optimized 
so that each receiver observes a signal that is seemingly interference-free and 
as close to the transmitted constellation point as possible. The shape of the 
decision regions for the constellation points is exploited to accept interference 
when it will not increase the decoding error probability. The downside with 
symbol-level precoding is that the precoding optimization is computationally 
complex and must be redone at every symbol time instance. 

Spatial layers: The parallel data streams that are spatially multiplexed 
to one or multiple devices are called spatial layers in 3GPP standards. In 
theory, the maximum number of spatial layers r = min(M, K) is determined 
by the number of antenna ports, but it can be smaller in practice. The number 
of orthogonal pilot sequences is predefined by the standard and manifests 
the maximum number of spatial layers, because we need a mechanism to 
estimate each column of the effective channel matrix HP that is obtained 
when applying the precoding matrix. Hence, once the standard has been 
defined, we can build base stations and devices with arbitrarily many antenna 
elements and antenna ports but the maximum number of spatial layers remains 
fixed. On the other hand, standards are revised when needed to utilize new 
functionalities, so adding more antenna ports and supporting more spatial 
layers typically come hand-in-hand. 

Null-steering, MMSE, and other linear precoding schemes: The 
optimal linear precoding vectors are given by (6.129), but they depend on the 
virtual uplink power coefficients that are generally challenging to compute for 
a given performance metric (e.g., maximum sum rate). The TWF, RZF, and 
ZF schemes were described in Section 6.4.5 as simplifications of the optimal 
precoding. One can find many other heuristic/simplified precoding schemes 
in the literature [85, Remark 3.2], having names such as null-steering, SUNR 
precoding, multi-cell MMSE precoding, minimum-variance distortionless re- 
sponse (MVDR) precoding, and virtual SINR beamforming. These schemes 
are motivated through (slightly) different heuristic arguments, which are often 
connected to the uplink-downlink duality. Nevertheless, they usually perform 
roughly the same and are nearly optimal, so it can be puzzling that there are 
many names for almost the same thing. There are fewer alternative uplink 
schemes since MMSE combining is optimal for any performance metric. How- 
ever, the MMSE scheme has alternative names, such as MVDR beamforming 
and interference-rejection combining. 

Holographic MIMO: This term is used to describe densely spaced 
antenna arrays with antenna spacings much smaller than A/2. The small 
spacing leads to spatial oversampling and mutual coupling effects. The latter 
can be exploited to achieve superdirectivity with beamforming gains that 
can be larger than usual in specific directions. The holographic terminology 
indicates that a densely spaced array can be implemented by having a surface 
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Figure 7.21: An ELAA consists of many antennas distributed over a huge aperture area, here 
exemplified as the facade of a building. The antenna spacing can be larger than in conventional 
arrays (e.g., one antenna per window) since the goal is to achieve tiny “beams” that have a 
finite depth and can be focused on individual user devices. 
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with a specific impedance pattern (i.e., the hologram) that is illuminated by 
a reference wave from a nearby antenna to generate an emitted wave [117], 
[118]. Each desired wavefront corresponds to a specific hologram that can be 
synthesized if the surface contains a dense grid of dielectric microstructures. 
The term large intelligent surface has also been used for similar purposes [76]. 
There exist commercial metamaterial antennas inspired by the holographic 
principle for both terrestrial and satellite communications, but as a way to 
implement the analog beamforming architecture without traditional PSs. 

Extremely large aperture array (ELAA): This term was coined 
in [118] to refer to antenna arrays with an aperture size that is very large 
compared to the wavelength. Different from holographic MIMO, the antenna 
spacing might be larger than in conventional arrays. The motivation for 
the vast aperture is that the prospective receivers will be in its radiative 
near-field, where the propagation phenomena differ from the conventional 
far-field models. In particular, “beams” can both be focused in angle and 
depth, thereby creating an elliptical region with a strong beamforming gain 
around the intended receiver [119]. This could enable spatial multiplexing of 
very many devices and data streams per device as a way to manage more 
traffic without requiring more bandwidth. An example skyscraper deployment 
is illustrated in Figure 7.21. Another option is to deploy circular arrays in the 
radiative near-field, which is called orbital angular momentum (OAM) because 
it results in helical beams [120]. The MIMO capacity expressions from earlier 
chapters can still be utilized in these cases, but the far-field approximations 
cannot be used when computing the channel matrix H. Indications of the 
propagation phenomena that appear in the radiative near-field were provided 
in Sections 4.4.2 and 4.4.3, which showed how high-rank channel matrices can 
be achieved in LOS conditions when using distributed or large arrays. 
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7.5 Exercises 


Exercise 7.1. Consider a SISO-OFDM channel with T + 1 taps, S subcarriers, and 
h{€] = 1 for all £ € {0,..., 7}. The subcarrier spacing is B/S = 15 kHz. 


(a) Compute the subcarrier channels A[v], for v € {0,...,5 — 1}. 


(b) Express the channel gains |h[v]|?, for v € {0,..., S — 1} in terms of sinusoidal 
functions. Hint: Rewrite the expression using (4.52). 


(c) Assume that T = 3 and S = 32. The maximum channel gain in (b) is obtained at 
v = 0. The first-null coherence bandwidth can be measured as 2(B/S)v*, where 
v* is the smallest subcarrier index in {1,...,31} for which h[v] = 0. This is the 
frequency interval (in Hz) between two nulls. What is the coherence bandwidth in 
this setup? 

(d) What is the first-null coherence bandwidth if T = 7 and S = 32? Is it smaller or 
larger than in (c)? 


Exercise 7.2. Prove the identity in (7.68): |P]? = ||Vss[2]|Iz- 


Exercise 7.3. Suppose the pulse used in the PAM is selected such that 


1 for |t| < 1/B, 
(p*p)(t) = 4 3-—2B\t| for 1/B < |t| < 1.5/B, (7.84) 
0 otherwise. 


Recall that (p * p)(t) appears in (2.126) when computing the coefficients of a multipath 
channel. Consider a channel with three propagation paths having the lengths 30m, 45m, 
and 108m, respectively. The bandwidth is B = 20 MHz and the carrier frequency is 
fe = 3 GHz. 

(a) What is the delay spread Tspreaa? 

(b) Determine the sampling delay 7 according to (7.5). 


(c) Compute the channel taps h[é], for £ € 0,...,T, by sampling (2.126) using 7 from 
(b). The attenuations a1, a2, and a3 have arbitrary values. 


Exercise 7.4. The subcarrier spacing in 5G NR can either be 15, 30, or 60 kHz. Consider 
an OFDM setup with S = 4000 subcarriers, Tspreaa = 4 ps, and T ~% BTspread- 


a) Which of the three subcarrier spacings minimizes the OFDM symbol duration 
g y 
while ensuring that the cyclic prefix does not increase the signal resource utilization 
(i.e., the complex degrees of freedom) by more than 20%? 


(b) What is the total bandwidth when using the subcarrier spacing from (a)? 
Exercise 7.5. Consider a SISO-OFDM channel with T+1 = 4 taps and S = 32 subcarriers. 
Each channel tap features independent and identical Rayleigh fading: h[&] ~ Nc(0, 8/4). 


(a) Compute the correlation between the frequency-domain channel coefficients at 
two different subcarriers v and v’. 


(b) How does the squared magnitude of the correlation vary as the difference |v — v’| 
increases? 
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Exercise 7.6. Consider a SISO-OFDM channel with S$ subcarriers and T + 1 channel 
taps. We would like to estimate the channels on all subcarriers, and therefore, the 
deterministic symbol ,/q is transmitted on all subcarriers. The received signal on 
subcarrier v € {0,5 — 1} is 


gly] = Jahr) + ale), (7.85) 


where h[v] is given in (7.15) and nfv] ~ Nc(0, No) is independent receiver noise. We will 
follow the estimation methodology from Section 4.2.4. 


(a) Suppose the channel on subcarrier v is estimated as Aly] = y[v]/vq. What is the 


variance of the estimation error h[v]— h[v]? What is the total error variance across 
the S subcarriers? 


(b) The S subcarrier channels h = [h[0],..., h[S — 1]]* are determined by the T +1 
channel taps h = [h[0],..., h[T]]7 as 


h = Fs roth, (7.86) 


where Fs,r41 = CSx(T+1) contains the first T + 1 columns of the DFT matrix 
F s. Suppose we estimate the time-domain channel taps as 


1 pu 


i= ST 
vi SH 


[y[O],.--, giS — 1]]* (7.87) 


and then transform it to an estimate of h as h = F s,r+1h. What is the total error 
variance across the S subcarriers? Hint: Fs,r+1Fs,r41h =h. 


(c) Suppose S = 2000 and T + 1 = 20. How large is the difference between the total 
error variances in (a) and (b)? Explain the difference. 


Exercise 7.7. Consider the dual-polarized array shown in Figure 7.18 and assume it 
consists of isotropic antenna elements. 


(a) What are the horizontal and vertical first-null beamwidths (in radians) in the 
broadside direction? 


(b) Consider a receiver located 50m from the array in the broadside direction. How 
wide is the beam in meters in the horizontal and vertical directions? 


(c) Consider another 8 x 2 UPA consisting of isotropic antennas with no subarrays. 
The horizontal antenna spacing is 0.5A and the vertical antenna spacing is 0.7); 
thus, the array aperture is smaller than in Figure 7.18. What are the horizontal and 
vertical first-null beamwidths of this array in the broadside direction? Compare 
the beamwidths and the maximum beamforming gain with those obtained by the 
original array in Figure 7.18. 


Exercise 7.8. Consider a hybrid ULA architecture with M = K antennas and Nrr 
RF inputs/outputs. What rate can be achieved over the channel in (7.47) if each of 
A, and A, contain Na > Nper columns from the scaled DFT matrix VME M, and 
cil = VBoe™®/ if £ = i — 1 and c[¢] = 0 otherwise, where o > 0 is a constant and 
T > 0 specifies power-decay behavior? Assume that Nprr data streams are transmitted 
with equal power allocation. 
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Exercise 7.9. Consider the channel in (7.60) with Na = M = K. Suppose A, and Ay 
are two DFT matrices. Each cluster only appears in one specific channel tap, such that 
cill] = VB; for L+ 1 = i, but c[¢] = 0 otherwise. The channel gains (1,..., 8N, will be 
treated as variables that can be selected freely under the constraint = Bi = B, where 
8 is the total channel gain. 


(a) For a given value of 8, which selection of (1,...,8Nn,,; maximizes the achievable 
rate at low SNRs? Answer this question for three different architectures: analog 
beamforming, hybrid beamforming with Nar < Na RF inputs/outputs, and 
digital beamforming. 


(b) Repeat (a) but consider the achievable rate at high SNRs. 


Exercise 7.10. Consider a MIMO-OFDM channel with T + 1 = 2 channel taps, S > 2 
subcarriers, and M = K = 2 antennas. The channel matrices are 


H[0] = /8 i l , Hü] = 8 E a . (7.88) 


(a) Compute the MIMO-OFDM capacity in terms of bit per OFDM symbol. 
(b) What is the capacity-achieving input distribution on subcarrier v? 
Exercise 7.11. Consider a Rician fading MIMO-OFDM channel with T + 1 = 2 taps. 


The first tap is the LOS path and the second tap is an i.i.d. fading matrix. Using the 
k-factor notation from Example 5.2, the two taps of this channel are defined as 


K 
k+1 


VeV anlor Ojakl(on 0) HU =y H Va (789) 


H [0] = 
where the entries of Hia € C’** are i.i.d. Nc(0, 1)-distributed. Suppose an analog 
beamforming architecture is used, and the transmit precoding and receive combining 
are based on the LOS path in H[0]. What fraction of the total channel gain MK will 
be received on the average? Is it an increasing or decreasing function of the «-factor? 


Exercise 7.12. When using analog beamforming and large bandwidth, the beam-squint 
effect can change the beam direction at the edges of the signal bandwidth. Consider the 
beamforming gain in (7.83) with B/fe = 0.1. 


(a) At what observation angle y is the gain maximized? Hint: The answer is an 
expression that depends on v//S. 
(b) Does the answer in (a) depend on M? 
(c) How many degrees is the beam shifted if v/S = 1/2? 
Exercise 7.13. The approximation H"[v]WrrWrrH|[y] ~ H"[v]H[v] is used in (7.75). 
Quantify the approximation error by computing the squared Frobenius norm of the 


difference 7 a = E 
H” [vp] WrrWerH[y] — H” yj). (7.90) 


Assume that Wrr contains the first Nar columns of U, which comes from the SVD 
H[v] = UNV" of the channel matrix on subcarrier v. 


Chapter 8 


Localization and Sensing with MIMO 


The previous chapters considered different forms of MIMO communications. 
Apart from communications, antenna arrays are also used for classical radar 
applications such as direction-of-arrival (DOA) estimation, target detection, 
localization, velocity estimation, etc. These topics are covered under the 
umbrella of (sensor/radar) array signal processing [51], [121], [122]. Commu- 
nication and radar technologies have evolved along separate paths for many 
years, requiring different physical equipment and separate deployments. A 
commonality is that progressively more antennas/sensors have been utilized 
to exploit the spatial dimension further to improve the respective performance 
metrics. As the radio hardware becomes more versatile and software-defined, 
it is desirable to use the same physical network equipment for multiple ap- 
plications, including communication, localization, and sensing. This design 
paradigm is called integrated sensing and communication (ISAC) [123] and can 
enable cost savings and new innovative use cases but also require fundamen- 
tal design tradeoffs. Since existing wireless communication networks feature 
wide-area coverage, it is a suitable platform to evolve into an ISAC system. 
The integration can take place by sharing the hardware and/or waveforms. 

Sensing refers to radar-like applications that aim to obtain spatial knowl- 
edge of the physical environment by transmitting known signals and observing 
their reflections on various objects. Typical sensing applications are target 
detection, target range and velocity estimation, and target tracking. Local- 
ization refers to determining the map coordinates of an object. This chapter 
covers the following fundamental applications: 1) far-field DOA estimation, 
2) localization, and 3) target detection. The aim is to analyze how having 
multiple antennas helps carry out the respective tasks. 


8.1 Direction-of-Arrival Estimation 


In this section, we consider DOA estimation, where the goal is to determine 
the angular directions (y, 0) of multiple waves that impinge on an antenna 
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array. We consider the free-space LOS channel model, developed in Chapter 4. 
The transmitters/sources that radiate the waves are assumed to be in the far- 
field of the receiver array, and we will use the narrowband signal assumption 
from Section 2.3.4. In other words, we assume the maximum difference in 
the propagation delay over the array is much shorter than the symbol time 
as in (4.7). This results in frequency flatness and the system model in (4.9). 
We note that estimation of the angles (vy, @) is the first step towards solving 
a LOS localization problem, where the physical locations of the sources are 
determined by also estimating what distance each signal has traveled. 

We consider a receiver array with M antennas that has K single-antenna 
radiating sources (transmitters) in its far-field. The signal radiated by source 
k impinges on the array as a planar wave from some angular direction (Yx, 9%), 
and the common channel gain to all the receive antennas is denoted by 6; > 0. 
The location of the receive antenna m is denoted by Um € R3, as in Chapter 4. 
It follows from (4.113) that the array response vector for source k is 


ej Sul Pr 
el Tu Pr 

a(pr, Ox) = l , (8.1) 
el UPR 


where p, the unit-length vector that points from the origin to source k: 


cos(x) cos( Oz.) 
Py = | sin(yr) cos(x) | - (8.2) 
sin(0;,) 
The received signal y{/] € C™ at integer sample index | can be expressed as 
K 
yl] = 5 V Bre Yr alor, On) xxl] + nfl, (8.3) 
k=1 


where x; [{l] is the baseband equivalent of the signal emitted by source k, that 
is sampled at time index l and 7, is the phase-shift introduced along the 
respective propagation path. The signals x;[I] have zero mean and variance 
P, and might contain data because they are random and unknown to the 
receiver. The independent receiver noise is distributed as n{l] ~ Nc(0, 071). 
We assume the source signals x;[!] and x;{l] are independent for k # i. 

The DOA estimation problem is to estimate (y,,6;,), for k = 1,...,K, 
using the received signals yl], for l= 1,..., L. We assume the channel gains, 
phase-shifts, and array response vectors are constant during these consecutive 
L samples. Exploiting multiple samples is useful to increase the estimation 
accuracy by improving the SNR and averaging out randomness in the source 
signals. We assume the number of sources, K, is known.' We further assume 
that K < M, which is required by some of the algorithms we will describe. 


1There are algorithms that detect the number of sources; see [122] for details. 
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The DOA estimation algorithms can be classified under two main branches: 
e Non-parametric (model-free) methods; 
e Parametric (model-based) methods. 


The non-parametric methods assume that the characteristics of the signals 
xxl] are unknown. However, the structure of the array response vector in (8.1) 
is assumed to be known because it is only based on the array geometry. On the 
other hand, the parametric methods utilize the statistics of the input signals 
x, {l] in addition to the structure of the array response vector. Since they 
exploit the specific system model parameters, they generally perform better 
than the non-parametric methods. In the following, we will first cover two 
non-parametric beamforming methods for DOA estimation and then describe 
a parametric subspace-based method that exploits the noise subspace. 


8.1.1 Conventional Non-Parametric Beamforming Method 


The beamforming methods considered for DOA estimation in this chapter are 
non-parametric. They are sometimes called spectral-based since they construct 
a DFT-like spatial spectrum that shows how much power is received from 
different angles (y, 0). The main peaks of that spectrum are the DOA estimates, 
but there will also be ripples created by side-lobes from receive beamforming. 

We recall that there are K sources whose angular directions are to be 
estimated. In beamforming techniques, a receive combining vector w € C™ is 
applied to all the received signals in (8.3): w"y|l], for L = 1,..., L. Then, the 
average squared magnitude of the combined signals is computed as 


We recognize Ry, as the unbiased sample average estimator (i.e., a matrix 
generalization of (2.171)) of the correlation matrix of y[l], which is defined as 


R = E {y[i]y" []} - (8.5) 


The randomness in yfl] is assumed independent across the L samples. Hence, 
the sample average correlation matrix R, approaches its statistical mean, 
R, when the number of samples L goes to infinity, as previously discussed 
in Section 2.6.1. This implies that P(w) in (8.4) is the sample estimate of 
E{|w"y{l]|?}, which is the average power of the signal obtained when the 
receive combining vector w is applied. 

Suppose one of the true DOAs is (yx, 0k). We can maximize |w"a(yx, 0k)? 
(among all unit-norm combining vectors) by selecting w equal to a(x, 0p). 
This corresponds to an MRC receiver, and we recall from Figure 4.8 that 
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MRC is a spatial bandpass filter that only provides a large power value if the 
angle used for MRC matches the angle of the incoming signal. Based on this 
principle, in the conventional beamforming method, we select the combining 
vector w(y, 4) as a function of (y,@) to match the array response vector: 


w(¢, 0) = aly, 0). (8.6) 


Therefore, P(w(y, @)) is an estimate of how much power is received from the 
direction (vy, 0), and estimating the DOAs corresponds to finding the K peaks 
of the function P(w(y,6@)). The peaks will be clearly noticeable when the 
SNR is high and/or the number of symbols L is sufficiently large. 

Inserting (8.6) into the power spectrum in (8.4), the DOA estimates 
(Gr; bx), for k = 1,..., K, are obtained as the K highest peaks of the function 


L 
Peonw( 9.) = + > la" (p, yl? =a"(y,)Rraly.8). (8.7) 
t=1 


This method can be applied when having any array geometry. The only 
requirement is that the array response vector is known for any angle pair 
(y, 9), which implies that the antennas must be phase-synchronized. 
Suppose we have a ULA with M antennas, and the K sources are in the 
same horizontal plane as the array. The array response vector is then given in 
(4.74) as 
1 
ea ian Ae 


x 2A sin(y) 
—j2r Saw) 


a as a (8.8) 


e-iae (aya sin(y) 

which is only a function of the azimuth angle. In this case, the DOA estimation 
problem turns into estimating the azimuth angles y1,...,yK. The power 
spectrum in (8.7) whose K highest peaks are the DOA estimates simplifies to 


1 a H 2 1 2 ~ jop C21 A sin(y) ? 
Pronv(Y) = L 5 la (pyll = L 5 e 2 Ymll] $ (8.9) 
l=1 1 


{=1 |m= 


Consider DOA estimation of a single source (K = 1) located at the DOA 
angle y = 7/6 in the same plane as the receiver. The receiver has a ULA 
with M = 2 or M = 10 antennas and A = 4/2 spacing. Figure 8.1 shows the 
normalized power spectrum obtained with 0dB SNR and L = 25 time samples. 
The normalization ensures that the peak value on each curve is 0dB.? The 


2The normalization is done by dividing the power spectrum Peony (y) by its maximum value 
Pmax = Maxo Peony (vy). It becomes easier to compare power spectra obtained with different 
SNRs and different numbers of antennas when applying the normalization. 
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Figure 8.1: The normalized power spectrum obtained with four random realizations of the 
source signals and noise with an M-antenna ULA and conventional beamforming. There is one 
source, and its DOA is y = 7/6. L = 25 time samples are used to find the generate the power 
spectrum. The respective DOA estimates are the horizontal values at the peaks of the curves. 
The peaks are marked with black and yellow crosses for M = 2 and M = 10, respectively. 


shape of the estimated power spectrum is affected by the random source signals 
and noise samples, which are all Gaussian distributed. We show four curves 
with different random realizations in the figure to showcase the variations one 
can expect. The black and yellow crosses denote the peak values on the curves 
with M = 2 and M = 10, respectively. The corresponding angle value is the 
DOA estimate ¢. The curves resemble beam patterns (recall the terminology 
in Figure 4.14) with narrower main beams and smaller side-lobes with M = 10 
antennas compared to M = 2. This results in more accurate angle estimates 
with M = 10, in the sense that the yellow crosses are very close to the true 
DOA angle. The randomness shifts the curves and, particularly, modifies the 
side-lobes. However, the system becomes more robust to randomness when 
there are more antennas, thanks to the higher spatial resolution (i.e., smaller 
beamwidth) and larger beamforming gain. 

In Figure 8.2, we show the MSE between the true DOA 7/6 and the 
estimate obtained (in radians) using conventional beamforming. The setup 
is the same as in Figure 8.1, except that we vary the number of samples, 
L, on the horizontal axis and consider two different SNR values: 0dB and 
10 dB. As expected, the lowest MSE is achieved using the most antennas and 
having the highest SNR. All four curves show that increasing the number 
of samples improves the DOA estimation quality. This happens because the 
sample average estimator Ry in (8.4) approaches the true correlation matrix 
R = E{y[l]y"[/]} as L > co, which progressively makes the power spectrum 
less dependent on the random signals and noise. 
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Figure 8.2: The MSE of the DOA estimation with conventional beamforming. A ULA is 


considered with either M = 2 or M = 10 antennas and SNR = 0 or SNR = 10dB. There is a 
single source with the DOA 91 = 7/6. 


The estimation performance can be studied analytically in the limit L > oo, 
where the power spectrum P.ony(y) with the ULA has the limit 


Fc) Ei Pia) = a'(y)Ra(y). (8.10) 


There is K = 1 source that sends the zero-mean signal x;[I] with power P}. 
The correlation matrix R of the received signal in (8.3) can then be computed 
as 


R = E{y[l]y"[J]} = Pibia(yı)a" (y1) + 07 I. (8.11) 
By inserting this expression into the right-hand side of (8.10), we obtain 


Peenv(Y) = a” (p) (P; Bia(y1)a" (p1) + oI) alp) 

= P6 |a" (paly) + Palol? 

< Pi bilal) llall? +M 

= P\6,M?+0°M, (8.12) 


where we utilized the Cauchy-Schwartz inequality from (2.18) and that array 
response vectors satisfy |la(y)|/? = M. The inequality is only satisfied with 
equality when a(y) and a(ọı) are parallel vectors, which happens for y = ¢1 
and y = T — pı when using a ULA with half-wavelength spacing (recall the 
mirror ambiguity from Figure 4.7). If the ULA is deployed so that only sources 
in the range y € [—7/2, 7/2] can occur, p = y1 is the unique maximum of the 
asymptotic power spectrum Ponv(y). Since this asymptotic DOA estimate is 
exact, the conventional beamforming method is a consistent DOA estimator. 
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Example 8.1. Is the power spectrum in (8.9) related to the Fourier transform? 
By introducing the variable v = —MAsin(y)/, we can express the power 
spectrum in (8.9) as 


1 L ;M-—1 l 2 
A BO Pl (8.13) 
l=1 ' m=0 


=VMFalym4i [E] 


If v is an integer, we can recognize the term inside the magnitude square as 
VM times the DFT of the M-length spatial sequence yı[l],..., ym[l], based 
on the definition in (2.195). Since the DFT is applied to the antenna index 
domain instead of the time domain, we call this the spatial DFT, and v is 
the normalized spatial frequency. The power spectrum is the average of these 
spatial DFTs with respect to the time samples l = 1,..., L. Different from the 
classical DFT that only considers the normalized frequencies v = 0,1,..., M — 
1, we consider a real-valued spatial frequency variable v = —MAsin(y)/A 
because we want to evaluate the power spectrum Peony(y) for any search 
direction y € [—7/2,7/2] to find its peaks. Hence, the Fourier transform 
appearing in this context is the spatial counterpart to the discrete-time Fourier 
transform (DTFT), defined as the DFT but with real-valued frequencies v. 
We might call it the discrete-space Fourier transform (DSFT). 


Thus far, we have considered a half-wavelength-spaced ULA to avoid 
spatial undersampling. As discussed in Section 4.3.4, grating lobes appear 
in directions other than the main beam’s direction when A is greater than 
A/2 in a ULA. Grating lobes can be acceptable in communications because 
the total interference level is unaffected; instead of sending interference to 
places close to the intended receiver when A = A/2, the same amount is sent 
somewhere else when A > \/2. The issue is more severe in DOA estimation 
since the grating lobes make the ULA unable to distinguish between some 
widely different directions. To showcase this phenomenon, we consider the 
same setup as in Figure 8.1 but increase the antenna spacing to A = à in 
Figure 8.3. The power spectrum now has two equally tall peaks: one at the 
true DOA y = 7/6 and another at p = —7/6. The estimator cannot determine 
which one is the true DOA because the array response vectors a(7/6) and 
a(—7/6) are equal, as can be seen by computing (8.8) with A = à: 


1 1 1 
«5 Asin(7/6) 19, Asin(—7/6) 
e i2n “ —1 e7}27 sın < T 
—j2n 2>sin(r/6) —j2n 2Asin(- 7/6) 
a(m/6)=| ee" > =| 2 |e). e x . (8.14) 
P jap DA sinte) (—1)™M-1 7 jog Asatro) 
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Figure 8.3: The normalized power spectrum in the same setup as in Figure 8.1 but for a single 
random realization and with the antenna spacing A = A instead of A = \/2. The large spacing 
results in two indistinguishable peaks in the power spectrum: One at the correct angle 7/6 and 
a grating lobe at —7/6. 


It is the spatial undersampling (i.e., aliasing) that creates the grating lobe 
at p = —7/6, and the ambiguity remains when L, M, or the SNR goes to 
infinity. The beamforming method cannot be consistent with such ambiguity; 
thus, one should only use ULAs with A < 4/2 for DOA estimation. 


Example 8.2. Consider a ULA with antenna spacing A = 2\/3 and a source 
with the DOA y = 7/6. Is there any ambiguity in the DOA estimator? Can 
it be resolved by increasing the number of antennas? 
For the given spacing, the mth entry of the array response vector a(7/6) 
S e En a) 
A 


eP =e E a, (8.15) 
If we can find another angle y € [—7/2, 7/2] for which the mth entry of a(y) 
coincides with (8.15), we have a grating lobe at that angle and this results 
in DOA ambiguity. To check for such an angle, we equate eJ#("—)) to the 
mth element of a(y): 


: -(m=1) sin Ro 4 2 

e e e = sin(y) pe = deh (8.16) 
for some integer n. The equality is satisfied for n = 0 and n = —1. For n = 0, 
we obtain p = 7/6, which is the true DOA. For n = —1, we obtain y = —7/2 
as another solution, so there is a grating lobe at that angle. This ambiguity 
cannot be resolved by changing the number of antennas. 
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Figure 8.4: The normalized power spectrum with an M-antenna ULA and conventional 
beamforming for a single random realization. There are K = 2 sources with the DOAs 1 = 7/6 
and y2 = 7/3. L = 25 time samples are used to compute the power spectrum. The peaks of the 
power spectrum are the DOA estimates. The black and yellow crosses denote the peaks with 
M = 2 and M = 10, respectively. 


Next, we consider K = 2 sources located in the same horizontal plane as 
the ULA. The sources have the DOAs yı = 7/6 and y2 = 7/3. The ULA 
has M = 2 or M = 10 antennas with A = 4/2 spacing. Figure 8.4 shows 
the normalized power spectrum obtained using L = 25 time samples where 
the SNR is 0dB. We show the spectrum for a single random signal/noise 
realization for each value of M. In this case, the DOA estimates 6), G2 should 
be the two tallest power spectrum peaks. When M = 2, there is only a single 
peak, which is located between the true DOAs and marked with a black cross. 
This ULA cannot distinguish between the two sources using its small number 
of antennas. This effect can be explained following the beamwidth discussion 
in Section 4.3.2. Suppose that the ULA points its beamforming toward one of 
the sources in the receiver processing. If the other source is located within 
the main beam (i.e., closer than the first nulls), it will disturb the angle 
estimation. The receiver observes constructive interference of the signals from 
both sources, which makes the power spectrum look as if there were only 
one source. As the number of antennas increases, the beamwidth becomes 
narrower, and we can observe two distinct peaks in the power spectrum. This 
can be seen in the case of M = 10, where the peak values are marked with 
yellow crosses. To get a rough idea of how many antennas are needed to resolve 
two sources, we can use the approximation in (4.62) of the distance between 
the beam direction and first null: 2/M radians. If the angular separation of 
the sources is larger than this, we can expect them to be distinguishable in 
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Figure 8.5: The normalized power spectrum with an M-antenna ULA and conventional 
beamforming for a single random realization. There are K = 2 sources with the DOAs 1 = 7/6 
and y2 = 7/5. L = 25 time samples are used to compute the power spectrum. The peaks of the 
power spectrum are the DOA estimates. The black and yellow crosses denote the peaks with 
M = 10 and M = 30, respectively. 


the power spectrum (when the SNR or L is sufficiently large). Hence, this is a 
measure of the array’s spatial resolutions. In Figure 8.4, the angle difference 
between yi = 7/6 and p2 = 77/3 is yo — yı = 7/6 ~ 0.52rad. When using 
M = 2 antennas, we need the spacing to be greater than 2/M = 1rad to 
ensure that two distinct peaks are visible in the spectrum. The corresponding 
minimum separation is 2/M = 0.2rad when M = 10, which is sufficient to 
clearly distinguish the sources, as seen in the figure. 


We will now change the DOA of the second source to p2 = 7/5, which 
reduces the angular separation to y2 — pı = 7/30 0.1 rad. Figure 8.5 shows 
the normalized power spectrum for this scenario with either M = 10 or M = 30 
antennas. In this case, we cannot resolve the sources using 10 antennas, but 
we only observe a single peak marked with a black cross. However, we can 
separate the sources with M = 30 antennas because 2/M = 2/30 ~ 0.07, 
which is smaller than 0.1. The two peaks in the power spectrum are marked 
with yellow crosses and are close to the true DOAs. 


The above principles also apply to cases with K > 2 sources. The num- 
bering of the sources is arbitrary in the system model. The conventional 
beamforming method finds K DOA estimates (when the spatial resolution is 
sufficient) but cannot determine how the sources were numbered. Further in- 
formation regarding the transmitted signals is required to distinguish between 
sources, which is assumed unavailable when using non-parametric methods. 
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8.1.2 Non-Parametric Capon Beamforming 


The conventional beamforming method works very well for DOA estimation in 
the single-source scenario. However, several modifications exist to enhance the 
resolution in multi-source scenarios. The general idea is to go beyond array 
response vectors and use other combining vectors w that make it easier to 
distinguish the sources. This is reminiscent of how MRC can be replaced by 
LMMSE combining in the uplink of multi-user MIMO to suppress inter-user 
interference and thereby achieve a higher data rate. 

One important technique is Capon beamforming, named after its originator 
Jack Capon [124]. This technique is also known as minimum-variance distor- 
tionless response (MVDR) beamforming. As the latter name indicates, when 
inspecting a specific direction (y, 6), we should use the beamforming vector 
w that minimizes the variance of the received signal while not distorting the 
signal that arrives from the intended direction. The variance is the received 
power P(w) = w"Ry,w defined in (8.4), while w"a(y, 0) = 1 is required not 
to distort the impinging wave coming from the direction (4,0). We find the 
Capon beamforming by solving the optimization problem 

minimize w"R;,w (8.17) 
weCM 
subject to w"a(y,0) =1. 
When there are L > M received signal samples, the estimate Rz; of the 
correlation matrix is almost always non-singular due to the noise. By defining 
w= Ry! 2w as a new optimization variable, the problem in (8.17) can be 
rewritten (by utilizing the invertibility of R 1) as 


minimize W"w (8.18) 
wecM 


subject to wR, alo, 6)=1. 


The vector W = Ralo, 6)/|Rz Palo, 0) ||? gives equality in the constraint 


and has the minimum norm among all potential solutions because it is parallel 


to R7” *a(y,0). Hence, this is the solution to (8.18). The corresponding 
solution to (8.17) is 


R;'a(y,9) 
a" (p, €)Rz ‘aly, 0) 
which is called Capon/MVDR beamforming. If we insert this vector into the 


general power spectrum expression in (8.4), we obtain the Capon spectrum 


Pcapon (9, 0) = a" (p, OR R ‘aly, 6) = a(o, Rz Italy, 0) i 
(ax(y, 6)R;z ‘aly, 6)) (ax(y, Rr aly, 0)) 
~ i (8.20) 
a"(y,0)R; a(y, 0) 


w=; w= (8.19) 
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Figure 8.6: The normalized power spectrum with conventional and Capon beamforming for a 
single random realization. A ULA with M = 10 antennas and A = 4/2 is considered. There is a 
single source with the DOA ọ = 7/6. 


The DOA estimates {ĝx, Og}, fork = 1,..., K, are obtained as the K highest 
peaks of the Capon spectrum. 

Figure 8.6 compares the normalized power spectrum of conventional and 
Capon beamforming for a single random realization. A ULA with M = 10 
antennas and A = 4/2 is considered. There is a single source with the DOA 
yp = 7/6, L = 25 samples are used, and the SNR is 0 dB. The figure shows that 
the beamwidth of the Capon beamformer is narrower; thus, it has a higher 
spatial resolution. The price to pay is that the peak of the power spectrum 
can be shifted more from the true DOA when Capon beamforming is used. 

The consequence of the larger deviation of the peak is highlighted in 
Figure 8.7, which shows the MSE of the DOA estimates with conventional and 
Capon beamforming for the same setup as in the last figure. This time, we 
vary the number of samples L and consider two SNR values: 0dB and 10dB. 
Conventional beamforming provides a smaller MSE than Capon beamforming 
in this single-source scenario when the number of samples is low. However, as 
L increases, the MSE gap diminishes. 

The bottom line is that Capon beamforming is unnecessary in the single- 
source scenario. However, it is designed to deal with situations with multiple 
sources, where the improved spatial resolution can help resolve closely spaced 
sources for which conventional beamforming fails. An example of this is 
provided in Figure 8.8, where we consider K = 2 sources with the DOAs 
yi = 7/6 and p2 = 7/5. A ULA with M = 20 antennas and A = 4/2 is 
considered. Figure 8.8 shows the normalized power spectrum of conventional 
and Capon beamforming for a single random realization. The spectrum with 
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Figure 8.7: The MSE of DOA estimation with conventional and Capon beamforming as a 
function of the number of samples used to compute the power spectra. A ULA with M = 10 
antennas and A = 4/2 is considered. There is a single source with the DOA y = 7/6, and the 
SNR is varied. 
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Figure 8.8: The normalized power spectrum with conventional and Capon beamforming for 

a single random realization. A ULA with M = 20 antennas and A = A/2 antenna spacing is 


considered. There are K = 2 sources with the DOAs p1 = 7/6 and p2 = 7/5. The peaks of the 
power spectrum are the DOA estimates. 


conventional beamforming only has one peak, so it cannot resolve the two 
sources. On the other hand, the Capon spectrum has two clearly distinguishable 
peaks thanks to its increased spatial resolution. The locations of the peaks are 
slightly biased/shifted compared to the true DOAs, but Capon beamforming 
can at least provide two decent DOA estimates in this challenging setup. 
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Example 8.3. Prove that Capon beamforming is a consistent DOA estimator 

when there is a single source, A = \/2, and y € |—7/2, 7/2] is of interest. 
An estimator is consistent if the estimation error vanishes asymptotically. 

When L > 00, it follows that Rz > R and the Capon spectrum approaches 


1 
a®(y, 0)R—1a(y, 6)’ 


Eostn es 0) ae Perea, 0) = (8.21) 


where the correlation matrix R is given by (8.11) for the single-source case. 
Using the matrix inversion lemma from Lemma 2.3, R7! can be expressed as 


R! = (Pibia(yı, 01)a” (p1, 01) + oI) 


Zo Aa a A o en 8.22 
O 1M a RaM 1)a” (yi, 01) ( ) 


Hence, a" (y, 0)R~ta(y,0) in the denominator of (8.21) becomes 


a (y, A)alyr,61)/* 
i í o? + Pipi : : 
=> O ien isi 2 
= z : oD 
o “M o2 aL P,B,M la (p, Aja(yi, 61)| (8 3) 


Inserting this expression into the right-hand side of (8.21), we obtain 


= 1 
RomA 0) = o2 Pi Bi 


> (8.24) 
o?M — zzy pi 2*9, 0al, 01)| 


The asymptotic Capon spectrum is maximized when |a" (4%, @)a(y1, 01)|? is 
maximized, which according to the Cauchy-Schwartz inequality in (2.18) 
only happens if a(y,@) = a(yi,41). This equation has only one solution 
yp € [-7/2, 7/2]; thus, Capon beamforming is a consistent DOA estimator. 


The DOA estimation performance changes if the sources transmit correlated 
signals. We will consider a ULA with M = 3 antennas and A = A/2 to 
exemplify this. There are K = 2 sources with the DOAs ọ1 = 0 and p2 = 7/6. 
The corresponding array response vectors can be computed using (8.8) as 


1 1 
e-it sin(7/6) | _ -j ; 
eo ir sin(7/6) =] 


(8.25) 


1 1 
a(yı = 0) = ro] = H , a(y2=7/6) = 


e772 sin(0) 


Suppose the random source signals x1[|l], x2[l] in (8.3) are always equal except 
for a phase-shift: x,[J] = 2x[l]e®. Such sources are called coherent. This 
scenario can happen when the same beacon signal is broadcasted from two 
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sources or when the signal from one source is reflected on two objects before 
reaching the receiver. For simplicity, suppose that 6; = b2 = 8, Yı = Y2 = 0, 
and ¢ = 0. The received signal in (8.3) then becomes 


yli] = v ia(yı)zı[l] + vb2a2(p2)x2[1] + nll] 
= V/B(a(0) +a(x/6) )a[l] + nfl]. (8.26) 
—— 


= 


Hence, when the source signals are coherent, the received signal is the same 
as if there were a single source with the effective array response vector 


1 1 2 
a= lil +l-j|=l1-jl. (8.27) 
1 -1 0 


The asymptotic power spectra in (8.12) and (8.24) for conventional and Capon 
beamforming, respectively, are proportional to |a" (p)a|. To obtain consistent 
estimates, we expect the power spectra to have their peaks at y = yı = 0 
and y = p2 = 7/6. However, this is not the case because |a”(0)a| ~ 3.16 
and |a"(7/6)a| ~ 1.41, while |a"(arcsin(1/4))a| ~ 3.41 gives a larger value. 
This specific angle gives a(arcsin(1/4)) = [1, (1 —j)/V2,—j]’, which resembles 
(8.27). Hence, the conventional and Capon beamforming methods are not 
consistent DOA estimators when the sources are coherent. 


Example 8.4. Consider K sources that transmit independent data signals 
with power P and suppose the noise variance is o?. How is Capon beamforming 
related to LMMSE combining in this case? 
In this setup, the Capon beamforming expression in (8.19) has the limit 
=i 


K 
w> cR tally, 0) =c z Ppralpr, Ona" (Pr, 0k) + 71M a(y, 0) 


k=1 

(8.28) 

as L > œ, where c = 1/(a"(y,0)R,‘a(y,9)) is a scalar. When inspecting 
the kth source direction by setting y = pp and 0 = Ox, the limit in (8.28) 
coincides with the LMMSE combining vector in (6.63) for an uplink multi-user 
MIMO system where the users transmit with power P and have the LOS 
channels hy = a(x, 0p) for k = 1,..., K. The only difference between the 
two expressions is the scalar c, which is selected to get a distortionless signal 
in Capon beamforming, while it is picked to minimize the MSE in LMMSE 
combining. The vital difference is the application: LMMSE combining is 
implemented to receive uplink data signals when the channels are known, 
while Capon beamforming aims at estimating the channel parameters without 
knowing the signals. However, the similarities between the system models 
imply that Capon beamforming is an approximate form of LMMSE combining. 
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In summary, conventional and Capon beamforming are consistent DOA 
estimators when there is a single source, if the array deployment causes no DOA 
ambiguity. The main beam with Capon beamforming can be slightly shifted; 
thus, it requires more samples to be as accurate as conventional beamforming 
when there is a single source. On the other hand, Capon beamforming has a 
higher spatial resolution, and when multiple sources have close DOAs, it can 
resolve sources that conventional beamforming cannot. There is a limit on 
how closely spaced sources these methods can distinguish for a given number 
of antennas. Correlation between the source signals will reduce the accuracy of 
the DOA estimates. These are the main reasons for developing more advanced 
methods that exploit the source and signal statistics. Later in this chapter, 
we will present a subspace-based technique belonging to that category. 


8.1.3 Joint Azimuth and Elevation DOA Estimation 


The Capon spectrum in (8.20) can be applied with arbitrary array geometries 
and source locations, but all the previous simulation examples have considered 
ULAs and sources with zero elevation angles. In this section, we will have a 
closer look at how the theory can be used to jointly estimate the azimuth and 
elevation angles of the sources. 

Figure 8.9 shows the normalized power spectrum with Capon beamforming 
for a single random realization containing L = 25 time samples. A ULA is 
considered with WM = 16 antennas and A = 4/2. There is a single source 
with the azimuth and elevation DOAs y = 7/4 and 0 = —7/4, respectively. 
The SNR is 0dB. The figure shows infinitely many peak values along the 
yellow arc in the azimuth-elevation plane. Hence, there is a DOA estimation 
ambiguity when using a ULA to simultaneously estimate the azimuth and 
elevation angles. The true source location is marked with a green circle and is 
on the arc, but we cannot distinguish it from the other points. This reason 
can be identified by analyzing the array response vector of the source: 


1 1 
ejr sin(w/4) cos(—m/4) ein /2 
a(m/4,—71/4) = . = : - (8.29) 
e`it(M-1) sin(7/4) cos(—7/4) e-iz(M-1)/2 


The same array response vector can be obtained for y = 7/6 and 6 = 0: 


1 1 
e-i” sin(7/6) cos(0) e7jT/2 
a(7/6,0) = , = , , (8.30) 
e-iT(M—1)sin(7/6) cos(0) o-in(M—1)/2 


Suppose we somehow know the elevation angle to the source (as in previous 
examples where it was zero). In that case, we only need to look for the peak 
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Figure 8.9: The normalized 2D power spectrum of DOA estimation for a ULA with M = 16 
antennas, L = 25 samples, 0dB SNR, and Capon beamforming. There is one source with the 
azimuth and elevation DOAs » = 7/4 and 0 = —7/4, indicated by the green circle. The color 
shows the spectrum value. It has infinitely many peaks along the yellow arc, which results in 
ambiguity. The correct point is only found if the receiver somehow knows the correct elevation 
DOA. The red cross, red stars, and red circle show the alternative DOA estimates obtained if 
the receiver knows that the elevation DOA is 6 = 0, 8 = +7 /3, or 6 = 77/4, respectively. 


value along the corresponding horizontal line in Figure 8.9, and there is only 
a single yellow peak on that line. For example, if we know that 6 = —7/4, we 
will find the correct DOA estimate. However, if we incorrectly believe that 
6 = 0, the red cross at y = 7/6 denotes the unique but incorrect solution we 
will get. Similarly, if we incorrectly believe that the correct elevation DOA is 
6 = +7/3, we will obtain y = 7/2 as the azimuth DOA estimate because it 
also gives the same array response vector as in (8.29): 


1 1 
eiT sin(r/2) cos(+7/3) ein /2 
aln/2 7/3) = . = . (8.31) 
e it(M—1) sin(7/2) cos(+7/3) e-iz(M-1)/2 


The red stars in the figure show these DOA estimates. 

On the other hand, it is not enough to know that y = 7/4 is the correct 
azimuth angle because there are two peaks on the corresponding vertical 
line, resulting in an ambiguity between 6 = +r /4 (marked with circles). The 
bottom line is that a ULA cannot estimate the azimuth and elevation DOAs 
jointly, but we must know the elevation angle to find the correct DOA. 
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Figure 8.10: The normalized power spectrum of DOA estimation for a UPA with My = My = 4, 
A = 4/2, and Capon beamforming. The parameters are otherwise the same as in the ULA case 
in Figure 8.9. Unlike that case, a single peak is located at the true DOA when using a UPA. 


One way to resolve the ambiguity is to use a two-dimensional array, capable 
of 3D beamforming, to resolve signals both horizontally and vertically. We will 
exemplify this feature by considering a UPA with My = 4 horizontal antennas, 
My = 4 vertical antennas, and A = à/2. The total number of antennas is the 
same as in the ULA, and all other parameters are unchanged. Figure 8.10 
shows the normalized power spectrum of Capon beamforming for a single 
random realization. In this case, there is only a single peak, and it is located 
at the true azimuth and elevation DOAs (i.e., p = 7/4 and 0 = —7/4). This 
implies that switching from a ULA to a UPA resolves the angular ambiguity 
issue. The enabling factor is that the array response vector can be expressed 
using (4.128) as 


asa(/4,—m/4) = aa(—1/4,0) @ a4(7/4, —77/4), (8.32) 


which is the Kronecker product of the array responses of two 4-antenna ULAs. 
When considering the ULA earlier in this section, we noticed that multiple 
DOA pairs give rise to the same vector as in the second factor in (8.32). These 
are all the values (y,@) that give sin(p) cos(@) = sin(m/4) cos(—7/4) = 1/2. 
If we pick the wrong angle pair, it will give the wrong vector in the first 
factor in (8.32). Hence, the array response vector is unique, and the UPA can 
provide a consistent estimate of both the azimuth and elevation angle. The 
only necessary condition is that the antenna spacing satisfies A < /2 and 
that we only consider azimuth angles on one side of the array: y € [—7/2, 7/2]. 
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Example 8.5. Consider a UPA with My = My = 2 and the antenna spacing 

A = À. If there is a single source with the azimuth and elevation DOAs 

pı = 7/4 and 0, = —7/4, is there a unique peak in the Capon spectrum? 
Following (4.128), the array response vector is 


1 1 
a2,2(Y, 0) > [a] ® e cot 3 (8.33) 


If there is an extra peak at some angle pair (¢,@) in the Capon spectrum, 
then both factors in (8.33) must be the same as for the true source angles. 
We begin by comparing the first factors, which are equal if 


1 


—2r sin(—7/4) = V2a = —2msin(6) + 2mnı => sin(6) = nı — a (8.34) 
for any integer nı. Two elevation angles satisfy this condition: the true DOA 
6, = —7/4 (for nı = 0) and the extra solution (for nı = 1) 

~ 1 
bə = arcsin | 1 — —= ) ~ 0.297 rad. 8.35 
(1-) (8.35) 


For any given value of 6, the second factor in (8.33) is the same as for the 
source if 


— 2r sin(a/4) cos(—1/4) = =r = —2r sin(P) cos(6) + 2rn2 


= 


=> sin(ğ)cos(ĝ) = nn += > ğ = arcsin eu (8.36) 
2 cos(0) 


for any integer ng. This equation has the two solutions: $11 = m/4 (for 
ng = 0) and ¢1,2 = —7/4 (for ng = —1) when 6; = —7/4 is considered. For 
92, we obtain the additionalsolutions 


1/2 


—1/2 
$2,1 = arcsin ( = ) ~ 0.55rad, $2.2 = arcsin ( / 


cos(ĝ2) 


æ —0.55 rad. 
cos(@3) 


(8.37) 


Hence, the power spectrum has the four peaks (1,1, 61), (G1,2, 61), (2,1, 82), 
and (%2,2, 62). The reason for not having a unique peak is the large antenna 
spacing of A = A, which creates one grating lobe in the azimuth plane and 
one in the elevation plane. The latter one has a grating lobe on its own. 
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8.1.4 Parametric Subspace-Based Methods 


Subspace-based methods can provide better DOA estimation accuracy than 
beamforming methods by exploiting further information regarding the source 
signals. Similar to beamforming methods, they exploit the estimate Ry of the 
received signal’s correlation matrix. As the name “subspace-based” suggests, 
these methods rely on explicitly separating the eigendecomposition of Êz into 
signal and noise subspaces [51]. MUltiple SIgnal Classification (MUSIC) [125], 
[126] and Estimation of Signal Parameters by Rotational Invariance Techniques 
(ESPRIT) [127], [128] are two classic subspace-based DOA estimation methods. 
The former method exploits the noise subspace, which is spanned by the 
eigenvectors of the smallest eigenvalues of Rz, while the latter technique uses 
the signal subspace spanned by the eigenvectors of the largest eigenvalues. 
In this section, we will describe the basic form of the MUSIC algorithm and 
compare it to Capon beamforming. We refer to the textbook [129] for a 
detailed description of ESPRIT and variations on MUSIC. 

We revisit the signal model in (8.3), where the received signal at time / is 


K 


yl] = >> Vee aly, Ox) cell] + nfl]. (8.38) 


k=1 


We assume the number of sources is smaller than the number of antennas (i.e., 
K < M) and define the vector p[l] = [Bie x [I], ...,/Bxe 2? * ex [l] 
containing the received signals at the first antenna. If we denote its correlation 
matrix as P = E{p[/]p"[/]}, the correlation matrix of y[l] can be expressed as 


R = E{y[ljy"(I]}} = APA" + o7Iy,, (8.39) 


where A € CMXK contains the array response vectors of the sources as its 
columns: 


A= [a(Y¢i, 01) kaia a(yk, x)| š (8.40) 


The eigendecomposition of the positive semi-definite Hermitian matrix APA” 
in (8.39) always exists and can be expressed as 


APA" = UDU”, (8.41) 


where the diagonal entries of D contain the real-valued positive eigenvalues in 
decreasing order and the columns of U are the corresponding unit-length eigen- 
vectors. Adding a scaled identity matrix to UDU" preserves the eigenvectors 
but increases all the eigenvalues (see Example 2.7). Hence, the eigendecompo- 
sition of R in (8.39) is 


R = APA" + o7Iy = U (D + o°I my) U”. (8.42) 


If the matrix APA# has rank of r, then r eigenvalues of R are strictly greater 
than g? and the remaining M — r eigenvalues are exactly 07. Since we index 
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the eigenvalues in decreasing order, we can decompose U as U = [Us, Un], 
where U, € CMX" contains the eigenvectors corresponding to the non-zero 
eigenvalues of APA”. These r eigenvectors span the signal subspace of R, as 
the subscript s indicates. This subspace contains all the received signals and 
additive noise. On the other hand, the columns of U, € CM*(M-r) contain 
the eigenvectors corresponding to the zero-valued eigenvalues of APA". These 
M — r eigenvectors span the noise subspace of R, as the subscript n indicates. 
This subspace only contains noise with variance o?. 

Since the eigenvectors in U, correspond to the zero-valued eigenvalues of 
APA, we have the relation 


APA*U, = 0. (8.43) 


From linear algebra, we know that if AP c C”** has the full rank of K 
(recall the assumption K < M), then (8.43) implies 


A*U, =0 = a” (yx, 0k)Un = 0, k=1,...,K 
=> a" (y%,0,)UnUsa(y,, 0%) =0, k=1,...,K. (8.44) 


The rank of AP is equal to the rank of APA". To achieve full rank, we need 
both A and P to have full rank. The correlation matrix P is non-singular when 
the source signals are not fully correlated (coherent). Secondly, the matrix A 
has full rank if and only if the K array response vectors a(y,, 0p) are linearly 
independent. When the second condition is satisfied, the array is said to be 
unambigious, which enables unique DOA estimates [51]. This is a necessary 
condition for the existence of a consistent estimator, but in non-asymptotic 
cases, there might nevertheless be multiple peaks in the spectrum even if 
there is only a single source, and the DOA estimate might be erroneous. The 
unambiguity is a usual assumption valid for the most commonly used arrays. 
The following lemma presents the conditions for a ULA. 


Lemma 8.1. The array response vectors a(yx, 9%) for k = 1,..., K, where 
K < M, are linearly independent for a horizontal ULA with A < \/2 if the 
K DOAs result in distinctly different values of sin(x) cos(6,). 


Suppose that the source angles satisfy 6, = 0 and pp E€ [—7/2,7/2], for 
k = 1,..., K. If the K azimuth angles y, are different, then Lemma 8.1 
implies that the array response vectors a(yx,0,) are linearly independent 
when using a ULA with A < \/2. 

If we know Uy, and the array is unambiguous, we can find the DOA angles 
of the sources by searching for K linearly independent array response vectors 
that give equality in (8.44). The MUSIC algorithm builds on this principle 
but deals with the situation where U, is estimated from the received signals. 

Under the assumption that APA” has full rank (i.e., r = K), the MUSIC 
algorithm estimates the DOA angles by first constructing the sample average 
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estimator of R using L samples as 
1 H 
Rz = L 5 yy" [l]. (8.45) 


We then compute the eigendecomposition of Ry, and let Ô, c CM*(M-K) þe 
the matrix whose columns are the unit-length eigenvectors corresponding to the 
M-— K smallest eigenvalues. Inspired by the fact that a” (y, 9)U, U#a(y, 0) = 0 
when considering the DOA of a source, we define the MUSIC spectrum as 


1 
a"(y, 6)U,Uxa(y, 0) 


Puusic(9, 8) = (8.46) 


for azimuth angles y € [—1/2,7/2] and elevation angles 0 € [—1/2,7/2]. 
The denominator is nearly zero when the angle (y, 6) is close to a source, 
which will generate a peak in the spectrum. If Uy is exactly equal to Un 
(i.e, Rg = R), then the MUSIC spectrum is infinite at the true DOAs. 
Since we only have access to the estimate U,,, the peak values and locations 
are approximations. When K is known, the K tallest peaks of the MUSIC 
spectrum are declared as the DOA estimates. When K is unknown, the MUSIC 
algorithm can also detect the number of sources by inspecting the eigenvalues 
of Rz. By comparing them with a threshold, we can determine how many are 
substantially larger than g? and use this value as the estimate of K. We then 
proceed by identifying the K tallest peaks of the MUSIC spectrum. 

In Figure 8.11, we show the normalized power spectra using either Capon 
beamforming or the MUSIC algorithm for DOA estimation. A ULA is consid- 
ered with M = 50 antennas and A = \/2. There are K = 2 sources with the 
azimuth DOAs 1 = 7/6 and y2 = 7/5, respectively. The elevation angles 
are zero. The source signals are independent and Gaussian distributed. The 
transmit power and channel gains are the same, and the common SNR is 
OdB. We use L = 100 samples in Figure 8.11(a). In this case, both Capon 
beamforming and the MUSIC algorithm have peaks around the true DOAs, 
although the peak values are not exactly centered at the true values, so the 
DOA estimates are not exact. However, the resolution of MUSIC is superior 
since the main beams are narrower. When we decrease the number of samples 
to L = 50 in Figure 8.11(b), we see that the performance of Capon beam- 
forming deteriorates, while MUSIC still performs roughly the same. Since 
the MUSIC algorithm explicitly exploits the eigenstructure of R by only 
using the estimated noise subspace when constructing the power spectrum, it 
generally provides higher resolution than beamforming methods. The differ- 
ence is particularly large when L is small; thus, MUSIC is said to be more 
sample-efficient than the beamforming methods. 

In Figure 8.12, we consider the same setup as in 8.11(b), but reduce the 
number of antennas to M = 10. In this scenario, neither MUSIC nor Capon 
beamforming can provide useful DOA estimates. Although MUSIC generally 
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Figure 8.11: The normalized power spectrum for a single random realization. Capon beam- 
forming and the MUSIC algorithm are used for DOA estimation using a ULA with M = 50 
and A = 4/2. There are K = 2 sources with the DOAs yi = 7/6 and p2 = 7/5, respectively. 
Different numbers of samples are considered when generating the spectra. 
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Figure 8.12: The normalized power spectrum with Capon beamforming and the MUSIC 
algorithm in the same setup as in 8.11(b) but with M = 10 antennas. 


provides higher estimation accuracy than the beamforming methods, the 
number of antennas limits the spatial resolution. The MUSIC algorithm also 
fails if the sources are closely located, compared to the beamwidth. 


Example 8.6. Consider DOA estimation with K = 2 fully correlated sources. 
What is the correlation matrix P? What is the rank of APA"? 

For K = 2 sources, we have p[l] = [Ge 21 [I], VBoeJ”2 x9[1]]7. When 
the sources are fully correlated, their correlation coefficient has a magnitude 
of one. Assuming the correlation coefficient is 1 (real-valued) and %1 = we = 0 
for notational convenience, we have E{x;[l]x5[1]} = VP, P2. The correlation 
matrix P = E{p[l|p"[l]} then becomes 


APy o] var 
P= = VBP VW 82Pal , 8.47 
v Pi B2Pi P> B2P2 v BoP» [VAP VPP eee 
which has rank one since it can be decomposed as the outer product of two 
vectors. Irrespective of the rank of A, the rank of APA" is also one because 


APA" = [a(¢1,1) a(¥2, 42) iS [VP VB2P2| Ee a 
= (v Ai Pra(yi, 1) +V b2P2a (p2, 02)) (v Ai Pra(yi, 01)+ V b2P2a (p2, 62)) 
(8.48) 


is the outer product of two vectors. When the correlation coefficient is smaller 
than 1, P has rank 2, so rank deficiency only occurs with full correlation. 


560 Localization and Sensing with MIMO 


a OF Capon 
»  |---- MUSIC 

g 

=] 

Ea 

1S) 

o 

z 

a 

o 

3 

(e) 

a el 

ge 

Q 

S 

w -8L 

g 

= 

© 

Z -10 

Angle-of-arrival y 
(a) The correlation coefficient is 0.9. 
0 E Capon 


t75- - -MUSIC | 


1 
bo 


I 
AS 


Normalized power spectrum [dB] 


Angle-of-arrival y 
(b) The correlation coefficient is 1 (i.e., fully correlated sources). 


Figure 8.13: The normalized power spectrum in the same setup as in 8.11(b) except that the 
source signals are either highly correlated or fully correlated. 
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We have seen previously in Section 8.1.2 that statistical correlation between 
the source signals can degrade the DOA estimation accuracy when using Capon 
beamforming. To further explore this phenomenon, Figure 8.13 considers the 
same setup as in 8.11(b), but now the source signals are correlated with 
Gaussian distributions. The correlation coefficient is 0.9 in Figure 8.13(a), 
whereas it is 1 in Figure 8.13(b). Capon beamforming cannot provide accurate 
DOA estimates in any of these cases, but it gets even worse when the sources 
are fully correlated. In contrast, the MUSIC algorithm is relatively robust to 
source correlation. The rank of APA® is two when the correlation coefficient 
is 0.9, and MUSIC proves peaks around the true DOAs. When the sources 
are fully correlated, the peaks are slightly shifted since the rank of APA” 
drops to 1 but remains fairly accurate. This demonstrates that subspace-based 
methods can handle source correlation relatively efficiently. 

Despite the better resolution, the MUSIC algorithm cannot jointly estimate 
the azimuth and elevation DOA angles when a ULA is utilized. Hence, it 
is required to use a two-dimensional array (e.g., a UPA) capable of 3D 
beamforming to solve the general DOA estimation problem. 


Example 8.7. Consider a UPA with My > 1 horizontal and My > 1 vertical 
antennas with the spacing A < 4/2. Show that the array response vectors 
AMy,My (Pk, 0k), for k = 1,2, are linearly independent for any combination of 
1; 91, 2, 92 € |-1/2, 7/2], except if both yı = p2 and 6) = b2. 

The UPA array response vector is given in (4.128) as 


AaMy,My(¥; 0) = amy, (9, 0) ® aMy(Y, 9), (8.49) 


which is the Kronecker product of two array response vectors for ULAs. For 
AMy,My(¥1,91) and amy,My (P2, 02) to be linearly dependent, both factors 
in the Kronecker product must be equal. We know from Lemma 8.1 that 
ayy (41,0) and ajy,(A2,0) are only linearly dependent if sin(6,) cos(0) = 
sin(@2) cos(0). The only solution in the range [—7/2, 7/2] is 0; = 02. Hence, 
linear dependence requires the elevation angles to be equal. 

The same lemma says that amy(Yk, 0k), for k = 1,2, are only linearly 
dependent when sin(y;,) cos(6;) has the same value for both sources. Since we 
already know that 6; = @2 is required for linear dependence, this implies that 
we further need sin(yi) = sin(y2). The only solution in the range [—1/2, 7/2] 
is Y1 = Y2. Hence, a UPA can uniquely identify sources located in different 
directions thanks to its ability to resolve sources in the elevation angle domain. 


In Figure 8.14, we show the normalized 2D power spectrum obtained 
with either Capon beamforming or the MUSIC algorithm when using a UPA 
with My = 10, My = 5, and A = 4/2. There are K = 4 sources located at 
the intersection points of the red dashed lines; that is, at the DOA azimuth 
and elevation angle pairs (7/20, 7/20), (7/20, —7/20), (7/20, 7/20), and 
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(—1/20, —7/20). L = 50 time samples are used to compute the power spectra 
and the SNR is 0dB. The source signals are independent and Gaussian 
distributed. By comparing the peaks of the Capon and MUSIC spectra, we 
note that MUSIC is more accurate and gives peaks close to the true DOA 
locations. Although the four DOAs share the same azimuth or elevation angles 
pairwise, the MUSIC algorithm can resolve these similar sources using a UPA. 

The MUSIC spectrum in (8.46) is generated under the assumption that 
there are K sources by using the eigenvectors corresponding to the M — K 
smallest eigenvalues of Rz. We will now look at the impact of wrongly 
estimating the number of sources. We consider the same setup as in Figure 8.14 
but only consider the MUSIC algorithm. There are K = 4 sources but Uy 
is constructed by incorrectly assuming K = 3 sources in Figure 8.15(a) and 
K = 10 sources in Figure 8.15(b). When we underestimate the number of 
sources, we effectively treat one dimension of the signal space as a part of the 
noise subspace. Since that dimension generally contains components from all 
four source signals (except in the special case where the array response vectors 
are mutually orthogonal), the result is that we lose the ability to estimate the 
DOAs of all the sources. On the other hand, the MUSIC algorithm is much 
more robust to overestimating the number of sources. When ten sources are 
assumed, the dimension of the noise subspace is reduced from 47 to 40, but it 
remains orthogonal to the signal space, so the peaks of the spectrum appear 
roughly at the correct locations. Hence, it is better first to overestimate the 
number of sources and then refine the estimate if the spectrum contains fewer 
peaks. If we know that K might overestimate K, we need an extra step in 
the algorithm to determine how many peaks to consider as source estimates. 

In summary, the subspace-based MUSIC algorithm provides higher DOA 
estimation accuracy than the beamforming methods. It is relatively robust to 
source signal correlation and can be used with an unknown number of sources. 
There exist modified versions of the MUSIC algorithm that are even better at 
managing signal correlation [129]. Other than that, there are more advanced 
parametric methods that further exploit the structure of the source signals 
for better accuracy [51]. Although the theoretical development of the MUSIC 
algorithm relies on the rank of P, this matrix is not explicitly considered 
when generating the MUSIC spectrum in (8.46). The correlation matrix 
estimate Rz; and the corresponding noise subspace U,, are used instead. 
There also exist parametric ML methods (i.e., extensions of the method 
described in Section 4.2.5) that exploit further knowledge of the source signal’s 
characteristics for enhanced estimation [51]. The more is known about the 
source signals, the higher DOA estimation accuracy can be achieved, but the 
computational complexity might also grow. In all the considered methods, 
we need to evaluate the power spectra for a dense grid of discrete angles to 
identify the peaks, which is especially complex in the 2D case. 
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(a) 2D spectrum with Capon beamforming. 
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(b) 2D spectrum with the MUSIC algorithm. 


Figure 8.14: The normalized power spectrum for a single random realization when using a 
UPA with My = 10, My = 5, and A = 4/2. There are K = 4 sources located at the intersection 
points of the red dashed lines: (7/20, 7/20), (7/20, —1/20), (—7/20, 7/20), and (—7/20, —r /20). 
L = 50 time samples are used to compute the power spectra. Capon beamforming is compared 
with the MUSIC algorithm. 
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(a) Power spectrum when believing there are K = 3 sources (too few). 
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(b) Power spectrum when believing there are K = 10 sources (too many). 


Figure 8.15: The normalized power spectrum obtained by the MUSIC algorithm in the same 
setup as in Figure 8.14. There are K = 4 sources, but the noise subspace is constructed by 
the eigenvectors corresponding to the M — K smallest eigenvalues. The number of sources is 
presumed to either be K =3 or K = 10. 
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8.2 Localization 


Source localization or simply localization is an extensively studied topic, where 
the aim is to estimate the unknown location of a source node, object, or person 
by using the measured data from multiple other sensors that have known 
locations [130]. We will call the object of interest (with an unknown location) 
the target node and other sensors that collect measurements the receivers. 
The location refers to a point in a selected coordinate system [131], such as a 
2D location in R? or a 3D location in R. The origin is at an arbitrary but 
predefined location. 

We will consider so-called cooperative localization, where the measurements 
collected at M receivers are fused to estimate the target node’s location. When 
the target transmits a signal, the receivers constitute a distributed receive 
antenna array. The signal propagates over an M-dimensional SIMO channel, 
but the goal is not to estimate its M complex coefficients (as in previous 
chapters) but only to extract the location. To this end, each receiver can 
measure the time-of-arrival (TOA) of the transmitted signal. If the target 
node and the receivers have synchronized clocks, the propagation delays to 
the respective receivers can be computed by knowing the time the signal 
was transmitted.? In a LOS scenario, these measurements can be used to 
deduce the respective distances to the target node by multiplying the delay 
by the speed of light. The distance measurements can be combined with 
the known locations of the receivers to extract the target location. If the 
receivers are synchronized but the transmission time is uncertain, they can 
compare their TOA measurements instead and determine the time-difference- 
of-arrival (TDOA). This scenario is of practical interest because it is hard 
to maintain precise synchronization between a mobile target node at an 
unknown location and a network of receivers. On the other hand, cables 
can be drawn between the fixed receivers to enable sharing of measurements 
and synchronization. When each receiver is equipped with multiple antennas, 
the receivers can individually estimate their DOA from the target node. By 
combining these DOA measurements with the known receiver locations, the 
target node’s location can be precisely estimated. Many practical systems use 
hybrid localization methods that fuse different kinds of physical measurements 
(e.g., angles, signal strengths, inertial sensor measures, different radio systems, 
etc.) so their respective weaknesses can be counteracted. In this section, we 
only cover the fundamentals of localization. We begin by exemplifying the 
basic principles of TOA-based localization and then cover the details of the 
TDOA- and DOA-based localization techniques. 


3 Alternatively, the round-trip delay can be measured by sending a signal from a receiver 
to the target node, which immediately sends it back [131]. Half the round-trip delay plus the 
initial transmission time can then be treated as the TOA. This procedure does not require clock 
synchronization but must be repeated M times when there are M receivers, making it inefficient 
for implementing cooperative localization. 
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8.2.1 TOA-Based Localization 


We will focus on 2D localization for notational convenience. Hence, the aim is 
to estimate the (x,y) € R? coordinates of the target node using M receivers 
that are distributed over the azimuth plane. A setup of this kind is shown in 
Figure 8.16(a), where the target node is denoted by a red star and located 
at the (unknown) coordinate (100,0)m. There are M = 3 receivers shown 
as blue squares at the known coordinates (—100,0), (0,100), and (0, —100). 
Note that the target and receivers are equally spaced on a circle with a 100m 
radius centered at the origin. We assume there are free-space LOS channels 
from the target node to each receiver. 

If a signal is transmitted by the target node at time 0 (or any other known 
time instance), the TOA at receiver m becomes 


tm = (8.50) 


where c is the speed of light and (£m, Ym) denotes the 2D coordinates of 
receiver m, for m = 1,...,M. Suppose the TOAs are measured perfectly. 
Receive m can then compute the corresponding propagation distance 


dm =tme= Vem — £)? + (Ym — y)? (8.51) 


and knows that the target node is located somewhere on a circle around 
receiver m with radius dm. Figure 8.16(a) shows these circles for the M = 3 
receivers. The three circles only intersect at the precise location of the target 
node; thus, three distance measurements are sufficient to uniquely estimate 
the location, which is known as trilateration. However, if we remove one of the 
receivers, the remaining two circles intersect at two locations, which creates 
ambiguity. In conclusion, at least M = 3 TOAs must be measured to find the 
2D target location in the noise-free case. 

In practice, the location estimate will be imperfect due to TOA measure- 
ment errors. The receiver noise creates an upper limit on the TOA measurement 
accuracy for a given bandwidth and SNR. Other error sources are multipath 
propagation (in addition to the LOS path) and synchronization mismatches. 
Suppose the total errors can be modeled as additive Gaussian noise so that 
the noisy distance measured at receiver m is 


Pim = dan + tym = 4f (2m L)? + (Ym — yY)? +m, m=1,...,M, (8.52) 


where Nm ~ N (0, 02). The variance of depends on the wireless technology, 
bandwidth, carrier frequency, range, etc. The aim of TOA-based localization 
is then to estimate the target location (x, y) as accurately as possible based on 
the noisy measurements rm, for m = 1,..., M. In Figure 8.16(b), we consider 
the same setup as in Figure 8.16(a), but the receivers only know the noisy 
measurements r1,...,rm and the variance os. Based on this information, each 
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(a) M = 3 receivers and noise-free measurements. 
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(c) M = 10 receivers and noisy measurements. 


Figure 8.16: Example of TOA-based localization in the azimuth plane with M receivers and a 
single target node. The location of the target node is indicated by a red star, and the locations 
of the receivers are shown as blue squares. The circle (or yellow annulus between two circles) 
indicates where each receiver believes the target is located. The intersection points/regions can 


be used to estimate the target’s location. 
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receiver can construct a confidence interval for the true value of dm. If we 
plot the lower and upper interval limits as two circles, the annulus between 
them contains the likely locations of the target node. 

In Figure 8.16(b), we consider the noise standard deviation og = 5m 
and construct our confidence intervals to contain three standard deviations: 
dm E [fm — 15,rm + 15]. Hence, the confidence interval for receiver m is 
represented by the yellow annulus between an inner circle around the receiver 
with radius rm —15 and an outer circle with radius rm +15. We can confidently 
say that the target node is located somewhere in the overlapping area between 
the M = 3 annuluses. We notice that the red star is in this area, but there is 
always an uncertainty in the location estimation when measurement errors 
occur. If the errors were smaller, each annulus shrinks, which improves the 
localization accuracy since the overlapping area also shrinks. Another way to 
improve the accuracy (for a fixed noise variance) is to fuse the measurements 
from more receivers. To see this impact visually, we consider M = 10 receivers 
in Figure 8.16(c) and distribute them uniformly on the left half of a circle 
centered at the origin with radius 100 m. The confidence intervals are generated 
as before, and we can be certain that the receiver is located in the area where 
all the ten annuluses intersect. This area shrinks with an increased number of 
receivers as the confidence intervals point in different directions and thereby 
have less overlap. 


8.2.2 TDOA-Based Localization 


As mentioned earlier, TOA-based localization requires clock synchronization 
between the target node and all the receivers to turn the TOA measurements 
into distance measurements. In practice, it is desirable to alleviate the need 
for the target node to be precisely synchronized with the receiver because that 
is hard to achieve when the location is unknown and there is only a wireless 
connection to it. Even a tiny clock bias of 1 ps can lead to a bias of 300m 
in the distance measurement because the speed of light is immense. In this 
section, we consider TDOA-based localization that does not rely on target 
node synchronization but only requires that the receivers have a common 
reference clock. The target node is assumed to transmit a signal at some 
unknown time 6, according to the receivers’ clock. The TOA at receiver m 
(in the absence of noise) is then changed from (8.50) to 


tm = Lõ= +5, m=1,...,M. (8.53) 


To remove the unknown 6 from these equations, in TDOA-based cooperative 
localization, we compute the differences between the TOAs measured at 
different receivers. In particular, we pick a reference receiver and give it the 
index 1. The TDOA between receivers m and 1 is tm — tı, and becomes 
independent of 6. If we can measure this TDOA perfectly, the corresponding 
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distance difference can be computed as 
dm, = ax = ty)e 
= (em —2)? + (ym — y)? — y (21 — £)? + (y1 —y)? (8.54) 


For a given measurement value dm,ı and known receiver locations (x1, y1) and 
(£m, Ym), the equation (8.54) defines one branch of a hyperbola with respect to 
(x,y) in the 2D Cartesian coordinate system. This bowl-like curve identifies 
all potential target locations that would give rise to the measured TDOA. In 
Figure 8.17(a), we revisit the setup from Figure 8.16(a) with M = 3 receivers. 
We let the receiver located at (0,100) m have the index 1 and be used as the 
reference for the TDOAs. By knowing the distance differences dz and d3 11, 
and the receiver locations, we can draw two hyperbola branches. One of the 
curves is straight while the other is bent, and they intersect at one point: the 
target node location (x, y) = (100,0) m. In this noise-free case, we notice that 
at least M = 3 receivers are needed for unambiguous 2D localization based 
on TDOAs. This is the same as for TOA-based localization. 

TDOA-based localization can be utilized even if the distance measurements 
are noisy. Similarly to (8.52), we let nm ~ N (0, o3) denote the independent 
additive noise at receiver m. The M —1 noisy distance difference measurements 
are then given as 


Tm, = dm — dı + Nm — nı 


= y/ (Em — 2)? + (Ym — Y)? — y (21 — 2)? + (y1 — Y)? + nm, (8-55) 
for m = 2,...,M. This equation with respect to (x,y) also defines one 


branch of a hyperbola, but we cannot draw it due to the noise. We would 
like to have an equation of the kind dm, = y(£m — 2)? + (ym — y)? 
(z1 — x)? + (yı — y)? as in (8.54). However, the term dm, is replaced by 
Tm,1 — Nm,1 in (8.55) where the collective noise realization nm,ı ~ N (0, 202) 
is unknown. Since the measurement value rm,ı and the noise distribution are 
known, we can compute a confidence interval for the value of dm,ı and use its 
limits to draw two hyperbola branches. We can then be confident that the 
target node is located in between these curves. 

In Figure 8.17(b), we consider the same setup as in Figure 8.17(a) but 
perform localization based on the noisy measurements fm, for m = 2,3. 
We assume the noise has the standard deviation oa = 5m, which implies 
that the collective noise realization nm,ı has the standard deviation 5/2m. 


4A hyperbola is the curve obtained when a double-cone is cut by a plane. The general 
equation is |V (am x)? + (ym — y)? y (21 x)? + (y1 — y)?| = dm, where (x1, y1) and 
(£m, Ym) are the two focal points and dm ,1 > 0 is a constant. A hyperbola contains two branches, 
which are two unconnected bent curves. Only one of these branches remains when the absolute 
value is removed as in (8.54). 
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(a) M = 3 receivers and noise-free measurements. 
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(b) M = 3 receivers and noisy measurements. 
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(c) M = 10 receivers and noisy measurements. 


Figure 8.17: Example of TDOA-based localization in the azimuth plane with M receivers and 
a single target node. The location of the target node is indicated by a red star, and the locations 
of the receivers are shown as blue squares. The hyperbola branch (or yellow regions between 
two branches) indicates where each receiver believes the target is located. The intersection 
points/regions can be used to estimate the target’s location. 
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We construct our confidence intervals to contain three standard deviations: 
tna = ek = 15V2, mA 15,/2]. The hyperbola branches obtained using 
the lower and upper limits of this interval are shown as curves in the figure, 
and the area in between is marked in yellow. We can be confident that the 
target node (red star) is located somewhere in the region where the two 
yellow areas intersect. This is also the case, but the intersection is pretty 
large—particularly compared to Figure 8.16(b), where we considered the same 
setup but with TOA-based localization. Hence, the price to pay for not having 
a clock-synchronized target node is reduced localization accuracy. 

We increase the number of receivers to M = 10 in Figure 8.17(c). There 
are now M — 1 = 9 yellow regions to consider, and their intersection region 
determines where the target node might be. The localization accuracy increases 
monotonically with the number of receivers. Since all the receivers in this 
example are located on the left-hand side of the target node, the intersection 
region has a long tail toward the right. This can be dealt with in practice by 
surrounding the potential target location with receivers. 

The yellow confidence areas in Figure 8.17 indicate where the target node 
might be, but some points in the areas are more likely than others. This 
statistical information can be utilized to obtain a specific localization estimate 
(4, 4). Unfortunately, there is no simple closed-form solution to this estimation 
problem because the equations are nonlinear and the noise terms n21,...,714,1 
are correlated. Several algorithms have been developed to tackle this problem 
[131]. One approach is to compute the ML estimate of (x,y) given the noisy 
observations rm,1, for m = 2,..., M. In this case, it is convenient to define the 
distance measurement vector r = [roi,..-,7ual]’ € RYT}, the noise vector 
n = [no1,...,2,1] E RT}, and the theoretical distance difference vector 
function d(x,y) = [do1(,y),...,dua(z,y)|" € R“~1, where the distance 
difference for receiver m is given by the function 


dm1(2,Y) = y (Em — 2)? + (Ym — Y)? = y (z1 = 2)? + (v1 —y)?. (8.56) 


This function computes what the distance difference would be for a specific 
guess (x, y) of the target node location. The ML estimation approach assumes 
that the target node’s unknown location (x, y) is deterministic. The received 
signal r = d(x,y) + n then has the real Gaussian multivariate distribution 
N(d(a,y),C), where C = E{nn} is the covariance matrix of the noise vector. 
We know from (2.87) that the PDF of r is 


fr) = l e73 (r-a(2,y)) "C7 (r—-d(@,y)) (8.57) 


M-1 


(2m) “= ,/det(C) 


We recall that the measurement errors nm in (8.55) were assumed to be 
independent and identically distributed as nm ~ M (0, o2). This implies that 
the (m — 1)th diagonal entry of C can be computed as 


E {nz 1} =E {(nm - n1)’ } =E {n3} +E {nj} = 204, (8.58) 
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for m = 2,...,M. The (m — 1,i — 1)th off-diagonal entry of C is given as 
E {nm 1ni 1} = E {(nm — ni) (ni — 21) } =E {ni} = oå. (8.59) 


The noise covariance matrix is 


202 o? és on 
2 2 
o 20 
C= d d (8.60) 
2 
; oF 
a. as. 4 203 


and it is non-diagonal since the noise at the reference receiver affects all the 
TDOAs. The ML estimates of x and y are the values that jointly maximize 
(8.57), which is equivalent to maximizing the argument of the exponential 
function or minimizing (r — d(a,y))"C~!(r — d(x, y)). Therefore, the ML 
estimates are obtained by solving the problem 


(ĉ,ĝ) = arg min (r — d(x, v)) c! (r = d(x,y)) (8.61) 
(x,y) 
The objective function to be minimized in (8.61) is not convex with respect 
to (x,y). This makes it computationally expensive to find the solution, for 
example, by evaluating the objective function on a dense grid of potential 
(x, y)-values and picking the best of them. The complexity can be managed 
using an iterative gradient descent algorithm, but it might not converge to 
the global optimum. If sufficient computational resources can be assigned to 
solve the ML estimation problem, it will provide better accuracy than other 
methods; however, alternative lower-complexity methods exist [131]. 
The estimation accuracy can be evaluated using the root MSE (RMSE) of 
the distance, which is defined as 


RMSE = yE {( =# += a}, (8.62) 


where the expectation is computed with respect to the measurement noise. 

The RMSE is shown in Figure 8.18 for the same setup as in Figure 8.17(c), 
but with a varying number of receivers. The location estimate (4, ĝ) is obtained 
by minimizing the objective in (8.61) using a gradient-descent algorithm. We 
consider two values of the noise standard deviation: og = 10m and og = 5m. 
This figure shows that increasing the number of receivers leads to improved 
localization accuracy. This effect is particularly noticeable up until 15 receivers, 
after which the RMSE decays more slowly because the extra receivers are 
placed next to existing ones on the edge of the same half circle. The noise 
standard deviation greatly impacts the localization accuracy, both for a 
given number of receivers and when considering the saturation level that is 
approached when M is large. 
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Figure 8.18: The RMSE of localization in (8.62) with respect to the number of receivers. The 
TDOA-based location estimate (ĉ, 9) is obtained by minimizing the ML objective function in 
(8.61) using a gradient-descent algorithm. The same setup is considered as in Figure 8.17(c), 
where the receivers are located along the edge of a half circle. 


Example 8.8. The TOA measurement errors limit the accuracy of TOA- and 
TDOA-based localization methods. How are these measurements made? 

The TOA is measured by sending a known signal from the target node with 
some time duration T, carrier frequency fe, and bandwidth B. The receiver 
correlates the received noisy signal with different time-delayed versions of 
the transmitted signal to determine which delay matches the most with the 
observation. The peak of the resulting crosscorrelation function is the TOA 
estimate. The variance of the TOA measurement depends on the mentioned 
parameters and the SNR. In particular, the variance in a free-space LOS 
channel can be lower bounded as [130, Eq. (5)] 


1 

Var{TOA} > SBT PSNR (8.63) 
when fe > B. The TOA measurement accuracy improves as the bandwidth 
and carrier frequency increase. Since new wireless systems designed for high- 
capacity communications progressively use higher carrier frequencies to make 
more bandwidth available, the TOA/TDOA-based localization accuracy can 
gradually improve if localization features are integrated into these systems. For 
example, a shift from a mid-band system with fe = 3 GHz and B = 100 MHz 
to a high-band system with fe = 30 GHz and B = 500 MHz will result in 
a 500 times lower TOA measurement variance, if all other parameters are 
unchanged. 
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8.2.3 DOA-Based Localization 


In TOA- and TDOA-based localization, the measurements are taken in the 
time domain, and we assumed that each receiver provides a single TOA 
measurement. When the receiver is equipped with multiple antennas, each 
one can measure a TOA. When the target node is in the far-field of the 
receiver, the TOA is approximately equal at all the receive antennas, but 
there are noticeable phase-shift differences that enable DOA estimation using 
the methods described in Section 8.1.° In DOA-based localization, also called 
AOA-based localization, each of the M receivers uses its multiple antennas to 
estimate the DOA from the target node. To explain the basics of DOA-based 
localization, we consider 2D localization, where the target node and all the 
receiver arrays are located in the azimuth plane. The DOA is then represented 
by an azimuth angle, which for a target node at the location (zx, y) and receiver 


m at (£m, Ym) with x > x, becomes® 
Pm = arctan (4) m= 1,...,M. (8.64) 
T — Tm 


If the value of ym is measured perfectly and the receiver location is known, 
we can treat (8.64) as an equation with respect to (x,y). In particular, the 
relation can be rearranged as y = tan(Qm)x + Ym — tan(Ym)£m, which is the 
equation of a straight line in the 2D Cartesian coordinate system. 

In Figure 8.19(a), we revisit the localization scenario from Figure 8.16(a) 
and Figure 8.17(a). By measuring the three angles 91, Y2, Y3 € [—1/2, 7/2] in 
(8.64), we can draw three straight lines. Each line starts from the respective 
receiver location and extends towards the positive x-axis direction since we 
assume £ > £m. These lines intersect at one point: the target node location. 
This is the only intersection point in the figure because any two non-identical 
lines can intersect at most once. Hence, having two multiple antenna receivers 
is sufficient for unambiguous 2D localization in the noise-free case if y1 4 p2. 
This principle is known as triangulation because the two lines plus the line 
between the receivers define a triangle. Since we know the length of one side 
of the triangle (between the receivers) and two angles (to the target), we can 
compute anything related to this triangle—including the target location. 

In practice, the DOA estimates will be subject to measurement errors. 
Suppose we can model the estimate as 


Tm = Ym F Nm, (8.65) 


5When the receiver is in the radiative near-field of the target node, the TOA differences over 
the receiver array are so large that range estimation is also possible—similar to when having M 
distributed receivers. We refer to [132] for further details. 

6For notational convenience, we assume that 2 > am so that the angle to the target node 
is between —7/2 and 7/2, and can be obtained using the arctan function. For £ < £m, we 
must add +7 to (8.64) to get the correct angle. If the receiver array is subject to mirror-like 
ambiguity, as illustrated in Figure 4.7 for ULAs, this ambiguity must also be resolved. This can, 
for instance, be done using rough TDOA estimation that determines which side the target is at. 
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(a) M = 3 receivers and noise-free measurements. 
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(b) M =3 receivers and noisy measurements. 
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(c) M = 10 receivers and noisy measurements. 


Figure 8.19: Example of DOA-based localization in the azimuth plane with M multi-antenna 
receivers and a single target node. The location of the target node is indicated by a red star, 
and the locations of the receivers are shown as blue squares. The straight line (or yellow areas 
between two lines) indicates where each receiver believes the target is located. The intersection 
points/regions can be used to estimate the target’s location. 
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which is the true DOA from (8.64) plus a Gaussian random noise realization 
Nm ~ N (0, 02). The noise is independent between the receivers, but we let 
the variance oĉ be the same for simplicity. The noise variance will depend 
on the number of antennas and SNR. It also depends on the wavelength 
because we get better angular resolution when the wavelength shrinks (for a 
given physical length of the array), so the measurement noise will be reduced. 
Based on the measured received signal rm, we know that the true DOA is 
Ym = Tm — Nm. Although the noise realization is unknown, we can use this 
relation to deduce a confidence interval for the DOA. By considering the lower 
and upper limits of this interval, we can draw two lines that start at receiver 
m and point in slightly different directions. We can then be confident that 
the target node is located somewhere between these lines. 

In Figure 8.19(b), we consider the same localization setup as in Fig- 
ure 8.19(a), but with noisy angle measurements with the standard deviation 
oO, = 4°. We construct our confidence interval as Ym € [rm — 12°, rm + 12°] 
by considering three standard deviations. In the figure, we show the straight 
lines obtained using the lower and upper limits of this interval, and the area 
in between is yellow. There are three such yellow areas whose intersection 
region specifies where the receiver must be located. The target node (red star) 
is located in this area, which is relatively small because the three receivers 
observe the target from very different angles, but it would be even smaller 
if o, was reduced. In Figure 8.19(c), we increase the number of receivers to 
M = 10 by adding extra receivers on the edge of the half-circle where the 
original receivers are located. There are many more yellow areas in this case, 
but their intersection region remains roughly the same as in Figure 8.19(b) 
because the new receivers cover angular directions between the previous ones. 
We need to deploy receivers that observe the target from the right-hand side 
or reduce the noise variance to get even higher estimation accuracy. 

Since the measurement error is Gaussian distributed, the true DOA is more 
likely to be at the center of the confidence interval than at the edges. We can 
identify the most likely target location among those in the intersection region 
of the yellow areas. This would be the ML estimate (#, 4) of the target node 
location. To formulate the ML estimation problem, we first introduce suitable 
notation: the measurement vector r = [r1,..., riz]? € R™, the noise vector 
n = [n1,...,n.] E€ R™, and the theoretical azimuth DOA vector function 
plx, y) = [Fi(z,y),---;Gu(z,y)]7 € RY, where the DOA at receiver m is 
given by the function 


Om(z, y) = arctan (ie — 2) f (8.66) 
T — Lm 
This function computes what the DOA angle would be for a specific guess 
(x,y) of the target node location. 

The ML estimation approach assumes that the target node’s unknown lo- 
cation (x,y) is deterministic. The received signal r = (x, y) +n is distributed 
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Figure 8.20: The RMSE of localization in (8.62) with respect to the number of receivers. The 
DOA-based location estimate (2,9) is obtained by minimizing the ML objective function in 
(8.68) using a gradient-descent algorithm. The same setup is considered as in Figure 8.19(c), 
where the receivers are located along the edge of a half circle. 


according to the real Gaussian multivariate distribution N (¢(, y), C) where 
C = E{nn‘} is the covariance matrix of n. We know from (2.87) that the 
PDF of r is 


0 E e- BO B(@y))7O*(F-B(wy)), (8.67) 


(27) 2 ,/det(C) 
The ML estimates of x and y are the values that jointly maximize (8.67), 
which is equivalent to maximizing the argument of the exponential function 
or minimizing (r — p(x, y)) "C7! (r — G(x, y)). Therefore, the ML estimates 

are obtained by solving the problem 
maen ep) Oe ey): (8.68) 

zy 
We have previously assumed that C = o2I m, but this problem can be solved 
with arbitrary noise covariance matrices (e.g., when some receivers have more 
accurate measurements than others). The main issue is that m(x, y) is a 
nonlinear function of x and y, which makes it computationally complicated 
to compute the solution to (8.68). As in the case of TDOA-based localization, 
we can find the solution to a predefined accuracy by evaluating the objective 
function on a dense grid of potential (x, y)-values and picking the best of these 
points. Using an iterative gradient descent algorithm leads to a more tractable 
complexity, but convergence to the global optimum is not guaranteed. 

In Figure 8.20, we plot the RMSE of the distance in (8.62) with respect 
to the number of DOA receivers. The location estimate (ĉ, ĝ) is obtained by 
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minimizing the objective in (8.68) using a gradient-descent algorithm. We 
consider the same setup as in Figure 8.19(c), except that we consider a varying 
number of receivers and two different standard deviations of the measurement 
noise: oy = 6° and oy = 3°. The figure shows that the RMSE decreases 
consistently with an increasing M, so having more receivers lead to better 
localization accuracy. We previously noticed in Figure 8.19 that the intersection 
region was nearly the same with M = 3 and M = 10 receivers. However, 
the probability distribution within the region becomes more favorable as M 
increases, which makes the ML estimate more accurate. Moreover, the noise 
variance greatly impacts the localization accuracy; if the standard deviation 
is cut in half, so is the RMSE. 


Example 8.9. We have seen that M = 3 receivers are sufficient to estimate the 
target’s 2D location unambiguously with TOA- and TDOA-based methods, 
while M = 2 is sufficient in DOA-based localization. How many receivers are 
needed for 3D localization? 

The M circles determined by the TOA measurements in noise-free TOA- 
based 2D localization turn into M spheres in the 3D coordinate system. In 
the noise-free case, there will be a unique intersection point (x,y,z) if there 
are at least M = 4 spheres. In noise-free TDOA localization, the TDOA 
measurements define M — 1 hyperboloids in the 3D coordinate system. We 
need at least M — 1 = 3 hyperboloids to get a unique intersection point; thus, 
at least M = 4 receivers are needed for unambiguous localization with these 
two methods [133]. Hence, we can get away with the same number of receivers 
regardless of whether the target node is synchronized with the receivers or 
not. The TOA-based method will, however, provide more accurate location 
estimates in the noisy case. 

If each of the M receivers can estimate its azimuth and elevation DOA 
from the target node without noise, these measurements will define M lines in 
the 3D coordinate system. Since two non-identical lines can only intersect at 
one point, M = 2 receivers are sufficient for unambiguous 3D localization. This 
is the same triangulation principle as in the 2D case. Note that the receivers 
need two-dimensional arrays (e.g., UPAs) to estimate both the azimuth and 
elevation angles. If each receiver is instead equipped with a horizontal ULA, 
then there will be an ambiguity in the azimuth-elevation plane, as exemplified 
in Figure 8.9. In such a case, each DOA measurement defines a surface in the 
3D coordinate system, and we need at least M = 3 receivers to locate the 
target node unambiguously. 


DOA-based localization requires multiple antennas, unlike the TOA- and 
TDOA-based approaches that only require a single antenna. These methods 
build on different principles by measuring angles and ranges, respectively, and 
can be combined for even higher estimation accuracy. 
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8.3 Target Detection 


The methods described thus far in this chapter rely on the target node actively 
transmitting a signal so that physical parameters (e.g., time delays, angles, 
and location) can be estimated by a wireless system equipped with receive 
antennas. In radar applications, the target is instead passive, so the wireless 
system must both transmit signals and receive them. Radar is originally an 
abbreviation of radio detection and ranging; thus, its first aim is to detect 
targets, and its second aim is to estimate physical parameters such as the 
range. We will focus on the detection part in this section because the previous 
section described the fundamental principles for parameter estimation. 

Target detection is the core problem of detecting whether there is an object 
of interest at a particular location by sending known signal pulses toward 
that target location. A receiver located near the transmitter (or at another 
predefined location) listens to the noisy echoes of the transmitted signal, which 
might be reflected off the target of interest. If there is no target, the received 
signal in a free-space LOS scenario contains only noise. On the other hand, 
if there is a target, the attenuated reflected signal is received along with the 
noise. The task of the detector is to determine whether there is a target or not 
by processing the received signal and exploiting prior information regarding 
the signal characteristics. There are two events in target detection: 


e There is no target; 
e The target exists. 


The binary hypothesis testing framework outlined in Section 2.7 is commonly 
used for target detection. In hypothesis testing, the absence of the target 
represents the null hypothesis Ho, whereas the alternative hypothesis Hı 
corresponds to the existence of the target. The detection method should take 
the reflection properties of the target into account. Intuitively, it is easier to 
detect an object if it is large, made of reflecting material, or happens to focus 
its reflected signal toward the receiver. When a planar wavefront impinges on 
the object from a specific angle, the reflected wave will have a complicated 
shape determined by the object’s physical characteristics, as illustrated in 
Figure 8.21. The receiver only observes the signal component that is reflected 
toward it; thus, we can quantify the reflection using a single number orcs 
called the radar cross section (RCS). Using antenna terminology, the RCS 
is the effective area of the object when facing the transmitter multiplied by 
the antenna gain toward the receiver for the reflected wave, which makes it 
measured in m?. Suppose the power flux density of the impinging wave (i.e., 
the power of the electromagnetic field at the target location per unit area) is 
Q, measured in W/m?. The reflected power by the target is then 


P= Qorcs: (8.69) 
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Figure 8.21: The RCS orcs [m?] quantifies how the target object reflects an impinging signal 
from the transmitter toward the receiver. It can be interpreted as the effective area of the object 
toward the transmitter multiplied by the antenna gain achieved by the reflected wavefront in the 
receiver direction. The RCS depends on the object’s physical properties and the location/rotation 
of the transmitter, receiver, and object. 


The RCS is the cumulative effect of the diffuse/specular reflection at different 
parts of the target. The value fluctuates as the target moves and is rotated 
because the effective area toward the transmitter and the antenna gain toward 
the receiver are angle-dependent. Several approaches exist in the radar litera- 
ture to statistically characterize the reflection of a target [134]. One key factor 
that creates modeling differences is the fluctuation frequency. The so-called 
Swerling models developed by Peter Swerling in the 1950s [134]-[136] take 
into account different fluctuating conditions and use different probability dis- 
tributions. In this chapter, we will outline the basic target detection methods 
for two such models: Swerling 1 and Swerling 2. 


Apart from the target reflection, the numbers of the transmitters and 
receivers, and their locations, also affect the detection problem and solution 
method. In Figure 8.22, we illustrate three categories of setups used for target 
detection. Each category can also be used for radar and sensing applications 
other than target detection. The basic setup is mono-static sensing, shown 
in Figure 8.22(a), where the transmitter and receiver are co-located. In this 
figure, the solid lines represent the radiated signal from the transmitter(s) 
toward the target, and the dashed lines represent the received signals at 
the receiver(s) after being reflected by the target. The antenna array is 
typically divided into two parts, where one is used for transmission and the 
other for reception. Figure 8.22(b) shows a bi-static sensing setup where 
the transmitter and the receiver are at different locations, thereby viewing 
the target from different angles. The RCS will be different in the bi-static 
and mono-static cases because the angles from the transmitter and receiver 
determine the RCS. The detection performance can be improved using multiple 
transmitters and receivers, which operate in mono-static or bi-static sensing 
mode. Figure 8.22(c) illustrates the corresponding multi-static sensing case. 
For example, the operation represented by purple lines is mono-static, whereas 
the red lines represent a bi-static setup. It is also possible to exploit the 
received signals at multiple receivers for detection, as shown by the green lines. 
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(a) Mono-static sensing for target detection. 
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(b) Bi-static sensing for target detection. 


(c) Multi-static sensing for target detection. 


Figure 8.22: Three categories of sensing systems are illustrated: mono-static, bi-static, and 
multi-static. Each system can be used for target detection but also other sensing applications. 
The solid lines represent the transmitted signal from each transmitter toward the potential target 
location. The dashed lines represent the received signal at each receiver after being reflected by 
the target object. 
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Another alternative is combining the mono-static and bi-static setups, as the 
blue lines show. Each of these propagation paths experiences a different RCS 
value for the same object due to the different transmission/reception angles. 
The primary purpose of multi-static sensing is to exploit spatial diversity 
because the RCS value can be very small for some angles and transmit-receiver 
pairs but likely not for all combinations simultaneously. 


8.3.1 Radar Range Equation 


We will now derive the radar range equation, which describes the average 
received power when a signal with a specific power is transmitted toward and 
reflected by the target object. The received power depends on the transmit 
power, frequency, antenna gains, distances to the target, and the RCS. We 
begin by considering the radar range equation for the bi-static sensing case in 
Figure 8.22(b). Initially, we assume a single-antenna transmitter that sends a 
signal with power P, and has the antenna gain function Gs (p1, 04), where the 
angles (yt, 4) lead from the transmitter to the target. In a free-space LOS 
propagation scenario with the distance ds to the target, the power flux density 
at the target location will be 


= PGi (pr, 0+) 


2 
re W/m (8.70) 


Q 


because the power is divided over a sphere with surface area 47d?. 

The RCS of the target is denoted opges in m?. Practical RCS values can 
vary immensely; thus, the decibel scale is often used when specifying them. 
By taking one square meter as the reference value, the RCS can be reported 
in decibel-of-square-meter (dBsm) as 10 logio (S$). Measured values from 
—50dBsm (insects) to 60dBsm (large ships, aircraft carriers) are reported in 
[134]. The RCS value is not always proportional to the size of the object; for 
example, the typical RCS value of a small truck is 20dBsm while it is only 
8dBsm for a large fighter aircraft and even smaller for stealth aircrafts [137]. 
We will now determine how the RCS value affects the received signal power. 
For a given value of ogcs, the effective isotropic reflected power from the target 
towards the receiver is Qorcs. We use the term “effective isotropic” similar to 
how the EIRP concept was defined in Section 4.5.5: the reflected power emitted 
towards the receiver is the same as if the target had an isotropic antenna 
that transmits with power Qorcg. The total reflected power can be entirely 
different because an object typically does not reflect power isotropically, but 
orcs depends on the angles that lead to the transmitter and receiver. 

Suppose the receiver is also equipped with a single antenna and has the 
antenna gain function G;(y,,0,), where the angles (y;,6,;) lead from the 
receiver to the target. In a free-space LOS propagation scenario with the 
distance d, from the target to the receiver, the channel gain from an isotropic 
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transmitter (effective isotropic reflector in this case) to the receiver is given 
by (1.40) as 8 = aap Glyn, 0,). Hence, the received signal power is 


P.G JO A? 
P, = Qorcs8 = + lee ones ag 
t T 


= PGi (pe, 9) Gr(Yr; 0.) ORCS 
7 (41) ?ded? l 


2 Gr (Pr, 0r) 


(8.71) 


This is known as the radar range equation and applies to a bi-static setup. The 
received power is proportional to the RCS and will later be used to distinguish 
the signal from the noise. An object with a small RCS provides a smaller SNR 
and is, therefore, more challenging to detect. 

We can use (8.71) to determine the mono-static radar range equation. 
Since the angles and distances are now the same to and from the target, it 
holds that y; = Yr = y, & = 0; = 0, and d; = d, = d. Inserting these values 
without subscripts into (8.71), we obtain the radar range equation for the 
mono-static case as 


z P,G (4, 0)G,(¥, 0)\?oRcs 


P, 
(4r)?dt 


(8.72) 


One crucial difference from the bi-static case is that the RCS orcs depends 
on the location and orientation of two nodes instead of three. 


Example 8.10. Suppose an SNR of —10dB is needed to detect the tar- 
get. What is the smallest RCS that enables target detection if P, = 10 W, 
Gily, 0) = Gxo,0) = 2, à = 0.01m (re, f = 30 GHz), B = 100 MHz, 
d= 100m, and No = 1070-4 W/Hz? 

By substituting the given values into (8.72) and dividing by the noise 
variance Nọ B = 107?0-4+8 — 10712-4 W, we obtain the SNR as 


P, _ 10-2?-0.01?oRcs 
NoB (4r)? - 1004 . 10712-4" 


SNR = (8.73) 
We can now solve the equation SNR > —10dB for arcs to obtain that the 
RCS should be at least 


(47)? - 10744 


= 1.98 ~ 2.96 dBsm. (8.74) 


orcs 2 0.1 


The bi-static received power in (8.71) is proportional to the squared 
wavelength, which implies that it reduces when the carrier frequency is 
increased if the antenna gain functions and RCS are fixed. However, we 
can also rewrite the received power in terms of the effective areas A; (yt, 0+) = 


¥ Gilo, 6,) and A,(y,, O,) = Ce, 6.) of the transmitter and receiver. In 
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this case, (8.71) becomes 


_ P, A (pr, 0+) Ar (Pr, 0- )oRCS 
And? d2 d2 ` 


P, (8.75) 
This expression is inversely proportional to the squared wavelength, which 
implies that it increases when the carrier frequency is increased if the effective 
antenna areas and RCS are constant. Hence, target detection can become 
easier at higher frequencies, particularly if antenna arrays are utilized to 
achieve large effective areas toward the target. 

The radar range equation can be easily extended to manage the case where 
the transmitter is equipped with K antennas, whereas the receiver has M 
antennas. When inspecting whether a target exists at a specific location, the 
transmitter can apply MRT precoding towards the prospective target location, 
while the receiver can apply MRC. We then achieve a combined beamforming 
gain of MK over a LOS channel, as shown in Section 4.4. The radar range 
equation in (8.71) for the bi-static setup is generalized by multiplying with 
the beamforming gain, which results in 


= P,G (pr, O )Gr (Pr, 6.) MK»? orcs 


P, 
(47)°dè dẹ 


(8.76) 


MRT focuses the transmission in a specific direction. If the target location is 
unknown (e.g., we want to detect if a vehicle exists somewhere on the road), 
the transmitter must scan for the target by sending beamformed signals in 
different directions. The orthogonal DFT beams described in Section 4.3.3 can 
be used to cover all dimensions, but a denser grid of non-orthogonal beams 
can also be used to ensure that nearly the maximum beamforming gain is 
achieved in any potential target direction. This kind of radar sweeping is 
often presented as a circle with a rotating beam in movies, and the detected 
targets show up as dots. Conventional radar systems perform mechanical 
beamforming by rotating the array instead of using electrical beamforming. 


Example 8.11. Consider the mono-static setup from Example 8.10. What is 
the minimum RCS value that a detectable target can have if the number of 
antennas at the transmitter and receiver is M = K = 4? 

Due to beamforming gain of MK = 4-4 = 16, the SNR is improved by 
a factor of 16 compared to the single-antenna case in (8.73). If we solve the 
equation SNR > —10dB for orcs, we obtain 


0.1 (47r)? = 10744 
> 
ORS iG a= 


A target with 16 times smaller RCS can be detected thanks to the beamforming 
gain. Even smaller targets can be found by using more antennas. 


x 0.12 x —9.1 dBsm. (8.77) 
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In the remainder of this chapter, we will consider the Swerling 1 and 
Swerling 2 target models, in which the target consists of many small diffuse 
reflectors that contribute to the overall effective RCS. Similar to the derivation 
of the Rayleigh fading channel in Section 5.1.1, the independent random-like 
phase-variations across the many reflectors give rise to a complex Gaussian 
coefficient in the complex baseband: cres ~ Nc(0, orcs). In fact, the target 
behaves as a multipath cluster when interacting with wireless signals. The 
magnitude |crcs| has a Rayleigh distribution, while the RCS realization 
|[crcs|? has an exponential distribution that satisfies E{|crcs|?} = orcs. 
Hence, we will now treat crcs as the average RCS value and crcg as the 
random realization. In analogy with the slow fading case in Chapter 5, in the 
Swerling 1 target model, the RCS realization is assumed to be fixed throughout 
the signal transmission interval used for target detection. When the target’s 
RCS fluctuates more rapidly, the Swerling 2 target model can be used, where 
the RCS takes a new independent realization for each transmitted symbol. The 
latter is the radar counterpart of the fast fading in communication channels. 
We will analyze the target detection problem for each of these models. 


8.3.2 Target Detection with the Swerling 1 Target Model 


In the Swerling 1 model, the target’s RCS is assumed to fluctuate slowly, so 
it is fixed throughout the L received symbols collected for target detection. 
If the target exists at the analyzed location, the received signal power is 
P,|crcs|?, where P, is the average power given by the radar range equation 
in (8.76) and cres ~ Nc(0,1) models the randomness. Note that, unlike the 
last section, crcs has unit variance because the average RCS orcs is now 
included in P, for notational convenience. We assume that a constant symbol 
“1” is transmitted during the L transmissions without loss of generality. Hence, 
the corresponding binary hypothesis test is 


Ho : yll/=nfl], l=1,...,L, (8.78) 
Hı : yll]j=VP.ercstnil], 1=1,...,L, (8.79) 


where the additive noise samples n{l] ~ Nc(0,07) are independent. 

In radar applications, there is typically no prior knowledge of the hypothesis 
probabilities. Hence, the Neyman-Pearson detector from Section 2.7 can be 
used to maximize the detection probability Pp for a desired value Pra = a of 
the false alarm probability. The Neyman-Pearson detector, which is optimal 
in this sense, was presented in Lemma 2.14. It states that Hı is selected if 


fyna (ylH1) 
Finu (ule) © 


where the threshold y > 0 will later be selected so that Ppa = a. To particu- 
larize the Neyman-Pearson detector for the hypothesis test in (8.78)-(8.79), 


(8.80) 
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we collect all the received samples in a vector y = [y[1],..., y[Z]]7 € C”, and 
define the noise vector n = [n[1],...,n[L]]" € C}. By letting 1, denote the 
[-dimensional vector with only ones, the received signal vector in (8.79) under 
the hypothesis Hı can be expressed as 


y = VPrlrercs +n ~ Nc(0, P:1L17 +07 Iz). (8.81) 


On the other hand, we have y ~ Nc(0,07I;,) when the hypothesis Ho is true. 
We can use the PDF of a complex Gaussian vector in (2.85) to evaluate the 
likelihood ratio in (8.80) as 


F -y"(Pelplito7ln) 'y 
Syn (YIH) TE det(P; 1114 +0?IL) © L 
me fyno (y|Ho) ~ 1 Zy" (oI) ly . (8.82) 
HAA zr det(o"Tz) © 


Using the fact that In(y) is a monotonically increasing function for y > 0, the 
Neyman-Pearson detector in (8.82) decides on the hypothesis H if 


o~®y"y — y" (P.1p1% +071.) y > n(y) —n(d), (8.83) 


where the constant b = det (oI) / det (P,1,1% + o7I,) is independent of 
the received signal y. Using the rank-one update formula in (2.48), we have 


P.a~* 


(P,1p17 + o’Iz) = oI, — 


Inserting this result into (8.83), the detector decides on H, if 


(1+ P.Lo~*)(In(7) — In(b)) 
Paot , 


1 


[Viyl? > (8.85) 


=] 
where 7’ is the revised threshold variable that must be selected so that 
Pra = a. We have 1#y ~ Ne(0, Lo?) if hypothesis Ho is true, which implies 
that |1%y|? ~ Exp(1/(Lo?)). Hence, we can compute the threshold using 
(2.91) as 


1 z y 
P, sas , (y|Ho) 2: =] —~e Lo 0z =e Lo? 
FA jey Sy\Hto (YHo) Oy o Ir 
> y = Lo*ln(a“’). (8.86) 


This threshold is inversely proportional to the specified false alarm probability 
a. If a is reduced, the threshold y’ increases but the detection probability 


G E E 1—=Lo?1l (a71) fya (y|#1) Oy (8.87) 
L 2y =La? lnia 
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becomes smaller since we integrate the PDF over a smaller set of values. This 
result highlights a fundamental tradeoff in target detection: a large detection 
probability is associated with a large false alarm probability, and vice versa. 

The term |1%y|” in (8.85) is called the sufficient statistics for target detec- 
tion because it is only this variable that must be measured and compared to 
the threshold y’ to implement the Neyman-Pearson detector, and it determines 
the detection probability in (8.87). Hence, the optimal receiver processing for 
target detection coherently combines the L received signal as 1Fy and then 
compares its power (i.e., its squared magnitude) to the predefined threshold 
y’. We note that the realization of cres is unknown, but coherent combining 
is achievable anyway because the realization is the same for all the L received 
symbols. In particular, under hypothesis Hı, it holds that 


E {|vzyl} = 1% (PAp1% +07I,)1,=L(P,L+07). (8.88) 


The average effective SNR is P,L/o?, which increases proportionally to L. We 
also recall from (8.76) that P, is proportional to the beamforming gain MK. 

To exemplify the Neyman-Pearson detector for solving the binary hy- 
pothesis test with the Swerling 1 target model, we consider the false alarm 
probability Pra = a = 10-3. Figure 8.23 shows the resulting detection proba- 
bility, Pp, versus the single-antenna SNR, which is computed by dividing the 
received power at a single antenna in (8.71) by the noise power o? = BNo. 
We consider a symmetric setup where both the transmitter and receiver have 
M antennas (i.e., K = M). Hence, the effective SNR is obtained by multi- 
plying the single-antenna SNR at the horizontal axis by the beamforming 
gain M?. We compare three setups: i) M = 1 antenna and L = 10 symbols; 
ii) M = 10 antennas and L = 10 symbols; and iii) M = 10 antennas and 
L = 100 symbols. We notice that the detection probability improves with the 
SNR. in all three cases, which is logical since target detection revolves around 
distinguishing signals from noise. The three curves have identical shapes but 
are shifted horizontally. The solid black curve is furthest to the right since it 
has the fewest antennas and symbols. The dashed red curve is shifted 20dB 
to the left because it has M = 10 antennas instead of one, which results in 
a beamforming gain of M? = 100 = 20dB. When the single-antenna SNR is 
—10dB, Pp increases from 0.03 to 0.93 when M = 1 is increased to M = 10; 
thus, the use of multiple antennas can make a huge difference. When M = 10, 
an additional performance improvement can be achieved by increasing the 
number of symbols. When going from L = 10 to L = 100, the total received 
power is increased by a factor of 10 thanks to the coherent combining. This 
explains why the dash-dotted blue curve is shifted 10dB to the left compared 
to the red curve. Hence, it is possible to obtain reasonable detection probabil- 
ity values at very low SNR values by utilizing many antennas or symbols. In 
practice, there is a limit on how large L can be made before the realization of 
crcs changes due to target movement. 
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Figure 8.23: The detection probability for different numbers of transmit/receive antennas and 
received symbols with respect to the single-antenna SNR for the Swerling 1 model. 


Example 8.12. We assumed that the transmitter sends the constant symbol 
“1” throughout the L symbol times when formulating the hypothesis test in 
(8.78)-(8.79). What changes in the Neyman-Pearson detector if the transmitted 
signal is x = [æ[1],...,x[L]]" € CZ, which is known at the receiver? 

The new received signal vector can be expressed as y = \/P,;xcrcg+n under 
the hypothesis Hı. This vector is distributed as y ~ Nc(0, P,xx" + o7Iz), 
while it still holds that y ~ Nc(0,07I,) when the hypothesis Ho is true. 
Following similar steps as in (8.82)-(8.85), we end up with a Neyman-Pearson 
detector that decides on the hypothesis H1 if |x"y|? > 7’, where the threshold 
y’ is selected to have the desired value Pra = a as 


co 1 ee, Z pon y 
Pra =a= / fyn (y|Ho) Oy = / ——_ oe Heo ee Ere 
piyay R 
Se — xo aa) (8.89) 


The optimal detector combines the received signal as x"y, where each received 
signal y[l] is multiplied by x*{l] before being summed up. The multiplication 
aligns the L signals in phase, and if the magnitudes |z[1]|,..., |a[L]| are varying, 
it also weighs them to maximize the SNR according to the MRC principle. 
Finally, the detector compares |x"y|? with the threshold y’ = ||x||?0? n(a7'). 
The average received power is E{|x"y|?} = P,||x||* + o?||x||?, which shows 
that it is the value ||x||? that matters not the individual symbols. This is why 
x = 1, works equally well as any other sequence that satisfies ||x||? = L. 
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8.3.3 Target Detection with the Swerling 2 Target Model 


In the Swerling 2 model, the target’s RCS is assumed to fluctuate so rapidly 
that it takes a new independent realization at each symbol time. The realization 
at time l is denoted by crcs|!] ~ NMc(0,1). We assume that L received 
signals are collected for target detection and that the constant symbol “1” is 
transmitted during all of them, as in the previous section. The corresponding 
binary hypothesis test is 


Ho : yl =ni], 1=1,...,L, (8.90) 
Hı : yil = V/Peercsll]+nfi], 1=1,...,L, (8.91) 


where P, is the average received power reflected through the target, which 
can be computed using the radar range equation in (8.76). The noise nl] ~ 
Nc(0, c?) and channel coefficients crcs{l] are independent. 

We will now particularize the Neyman-Pearson detector for this scenario, 
where the channel coefficients fluctuate. To prepare for this, we define the vec- 
tors y= [y(1], wm y[L]] = Cc’ CRCS = [crcs[1], sone ,cercs[L]]” = ce, and n = 
[n[1],...,n[L]]" € C}. We note that crcs ~ Nc(0, Iz) and n ~ Nc(0,07I;). 
When the null hypothesis Ho is correct, the received signal vector becomes 
y = n ~ Nc(0,o°Iz). When the hypothesis H is true, the received signal 
instead becomes y = yP,crcs +n ~ Nc(0, (P. + 07)I,). We can use the 
PDF of a complex Gaussian vector in (2.85) to evaluate the likelihood ratio 
in (2.191) from Lemma 2.14 as 


1 —y"((P,+0?)IL) } 
< fa (YH) aeae yË (P+) y ta 
fyno (y|Ho) aarre YO) y 


Using the fact that ln(y) is a monotonically increasing function for y > 0, the 
Neyman-Pearson detector in (8.92) decides on the hypothesis H if 


yy yy 
— > — ` 
= Pag In(y) — ln(b), (8.93) 


where the constant b = det (oI) /det ((P, +.0?)IL) = (0?7/(P, + o?))” is 
independent of the received signal y. By arranging the terms in (8.93), we 
can express the condition for selecting hypothesis Hı as 


ny, O7(Pr + 07)(In(7) — In(d)) 
y y2 P. ; 


ri 


(8.94) 


= 


where 7’ is the revised threshold variable that must be selected so that 


P Ho) ð pE ee 8.95 
pcan | Am l o) y=] (o) (L-1)! Z, (8. ) 


y’ 
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where the last equality follows from (2.99) because y"y = ||y||? has a scaled 
x -distribution under the hypothesis Ho. The integral can be computed using 
the incomplete gamma function, but it lacks a closed-form inverse, so (8.95) 
must be solved numerically. 

The term y"y = D ly/l]|? in (8.94) is the sufficient statistics for target 
detection in this scenario. Hence, the optimal receiver processing for target 
detection adds up the powers of the individual received signals y[l] and 
compares the result to the predefined threshold y’. This approach differs from 
the detector derived for the Swerling 1 target model. The reason is that the 
channel coefficient crcs[l] takes a new unknown realization at every time 
instant, so the receiver cannot coherently combine the signals. One way to 
quantify the difference is to compute the total power of the received signal: 


E { |ly||?} = tr ((P; +07 )Iz) = L(P, + o°). (8.96) 


The average effective SNR is P,/o?, which is independent of L. This is different 
from (8.88) where an L times larger SNR value was achieved with the Swerling 
1 target model, thanks to the coherent combining at the receiver. Fortunately, 
the term P, remains proportional to the beamforming gain M K, so we still 
benefit from having multiple antennas because the RCS realization is the 
same for all antennas. 

For a selected threshold y’, the detection probability Pp is given as 


Po= f Syne (YIH) dy. (8.97) 


YV Yay 


To exemplify the Neyman-Pearson detector for solving the binary hy- 
pothesis test with the Swerling 2 target model, we consider the false alarm 
probability Pra = a = 107°. Figure 8.24 shows the detection probability, 
Pp, versus the single-antenna SNR, which is computed as in Figure 8.23. We 
consider a symmetric setup where both the transmitter and receiver have M 
antennas (i.e., K = M). Hence, the effective SNR is obtained by multiplying 
the single-antenna SNR at the horizontal axis by the beamforming gain M?. 
There are three curves, which represent different numbers of antennas M 
and symbols L. As expected, the detection probability improves as the SNR 
increases. When the number of antennas increases from M = 1 to M = 10 (i.e., 
from solid black to dashed red), a beamforming gain of M? = 100 = 20dB is 
achieved. This shifts the detection probability curve to the left by 20dB. This 
can make an immense difference: when the single-antenna SNR is —10dB, 
Pp increases from almost zero to almost one. If we increase the number of 
symbols from L = 10 to L = 100, the detection probability curve is further 
shifted to the left, but the gain is much less than 10 dB, even if we receive 
10 times more power. It might come as a surprise that the curve is shifted at 
all because we observed in (8.96) that the average SNR is independent of L. 
Although the receive combining does not provide any coherent power gain, 


591 


8.3. Target Detection 


1 + 
I 
i 1 
i I 
0.8 i 1 4 
i l 
i I 
7 I 
0.6 ! i J 
A H I 
ae ; 1 
0.4 1 ] 
i I 
i I 
0.2 3 —M i, L i0 | 
i ---M=10,L=10 
eo -----M = 10, L = 100 
0 — i 
-40 -30 -20 -10 0 10 20 
SNR [dB] 


Figure 8.24: The detection probability for different numbers of transmit/receive antennas and 
received symbols with respect to the single-antenna SNR for the Swerling 2 model. 


we achieve a time diversity gain that makes the distribution of ||y||? more 
confined around its mean when L is increased. Such diversity is beneficial 
when we try to reach small probabilities, such as Ppa = 107°. 

By comparing Figure 8.24 with the Swerling 1 counterpart in Figure 8.23, 

we can notice two main things. Firstly, the SNR values that give 0.5 (i.e., the 
median) are shifted to the right in Swerling 2, so the increased randomness 
generally leads to performance degradation. Secondly, the detection probability 
curves are steeper with Swerling 2 due to the time diversity that suppresses 
the channel’s randomness. When the SNR is low, the power gain brought 
by coherent combining with Swerling 1 is preferable over the diversity gain. 
However, the diversity gain obtained in Swerling 2 dominates the loss due 
to non-coherent combining at high SNR, where the noise level is already 
much smaller than the average signal level. Hence, Swerling 2 provides better 
performance than Swerling 1 in these situations. 

The choice of the RCS model clearly impacts the target detection perfor- 
mance. The Swerling 1 model is suitable when the target is approximately 
static during the transmission time, while the Swerling 2 is suitable for highly 
mobile targets. One could also create an intermediate block-fading-like model 
where the RCS realization is constant for multiple symbols but not the entire 
transmission time. There are further Swerling models where the RCS parame- 

ter has a different distribution than complex Gaussian [134]—[136]. As shown 

in Figure 5.4, the Gaussian distribution appears when there are at least five 
equally strong scattering objects on the target object, but some targets might 


have a shape that is not well modeled like that. 
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Example 8.13. Consider a multi-static target detection setup with a sin- 

gle transmitter and two spatially separated receivers. Derive the sufficient 

statistics of the Neyman-Pearson detector with the Swerling 2 target model. 
In this scenario, we can express the binary hypothesis test as 


Aes soni =a. On| =e E (8.98) 
Hı l) = /Peacill] +l, voll] = Pe 2cell] + nell], 1=1,...,0, 
(8.99) 


where cı[l] ~ Nc(0,1) and c2{l] ~ Nc(0,1) are the independent random RCS 
coefficients, while P,,; and P;2 denote the average received powers at the 
two receivers. These might be different since the average RCS depends on 
the receivers’ angles to the target. The noise samples nj[l] ~ Nc(0,07) and 
nl] ~ Nc(0, 07) are independent since the receivers are spatially separated. 
We denas ya = 0 ey fell eC! cco lee Whee 
and nm = [Malll,..-,%mlE| € CY, for m = 1,2. We can now note that 
Cm ~ Nc(0,Iz) and nm ~ Nc(0, o°Iz). Since yı and yz are independent 
under both hypotheses, we construct the likelihood ratio in (2.191) as 


yiy1 yoyo 
ze Pyrite? © Py, 2+02 


i 
o (Y1: y2|H1) _ m?L(P, 1+0?) (P, 2+0?) 


H H 
a Te a 
oaee e Te 


(8.100) 


Taking the logarithm of both sides and omitting the constant coefficients, the 
sufficient statistics for the Neyman-Pearson detector can be expressed as 


1 1 z 1 1 ” 
(= = Ss yiyi+ ( ) Yoy2- (8.101) 


ae Jae Ror 


This is a weighted sum of the sufficient statistics the receivers would use in 
the single-receiver case. The receiver that experiences the largest received 
power uses the largest weight, but both receivers are useful. 


8.3.4 Different Types of Radar Antenna Arrays 


The radar technology dates back to Christian Htilsmeyer, who filed a patent 
in 1904 on a system that uses electromagnetic waves to detect metallic objects 
[138], and it was demonstrated for target detection at sea to avoid ship 
collisions. The technology was not utilized at scale until the Second World 
War, which is when the United States Navy introduced the radar abbreviation. 
A classical radar consists of a highly directive antenna that is mechanically 
rotated over time to scan different angular directions sequentially. The passive 
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electronically scanned array (PESA) technology appeared in the 1960s based 
on the analog beamforming architecture, previously illustrated in Figure 7.10. 
The directivity is controlled by electrical beamforming in PESA radars, which 
enables faster scanning and flexibility in which directions are considered than 
mechanical beamforming. These features are particularly useful for target 
tracking. Some PESA radars can emit/receive multiple beams simultaneously, 
which resembles the hybrid beamforming architecture in Figure 7.12. 

The most capable radars use the digital beamforming architecture, where 
each antenna is directly connected to the digital baseband as in Figure 7.9. 
This is called the active electronically scanned array (AESA) technology 
and enables simultaneous beamforming in different directions at different 
frequencies. Practical implementations began in the 1990s, but the higher 
implementation cost has thus far led to AESA radars primarily being used 
in mission-critical military applications where many targets must be simul- 
taneously detected, localized, tracked, and potentially attacked. This might 
change when MIMO communication systems evolve into ISAC systems, where 
the digital architecture required for high-capacity MIMO communications is 
also used for sensing applications. For this reason, the theory described in 
this chapter presumes the use of the digital architecture. 

The term MIMO radar has been used for decades [139], [140] and created 
some controversy [141] because not all MIMO communication features are 
helpful for radars. For example, the ergodic capacity in (5.131) over a point-to- 
point MIMO channel is achieved by spreading many independent data signals 
in different directions, and the sum capacity of a multi-user MIMO channel 
is achieved by sending many simultaneous signals even if this reduces the 
capacity of individual streams and users. In radar applications, the accuracy of 
individual sensing tasks might be more important than the ability to spatially 
multiplex many sensing tasks if the latter comes with reduced accuracy. The 
pragmatic view is that MIMO radar theory [142] describes how to operate 
AESA radars in different situations, which sometimes results in the same 
functionality as a PESA radar—similar to how beamforming of one signal 
is capacity-achieving in point-to-point MIMO systems that have low SNR. 
In other situations, AESA radars can benefit from simultaneous detection of 
multiple targets, higher spatial resolution, flexible interference suppression, 
and different directivity at different frequencies [118]. 

An additional way to improve spatial resolution is to utilize a synthetic 
aperture created by moving the antennas during the measurement period. If 
the deployment location is fixed, the antennas can be moved around at that 
location. If the radar is deployed on a satellite that travels around the Earth, 
a synthetic aperture is created even if the antennas are fixed at the satellite. 
In any case, by combining the measurements made at different times, the 
resolution of radar sensing becomes identical to using a physical array that 
simultaneously has antennas at all the measurement locations. 
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8.3.5 Integrated Sensing and Communication 


The term ISAC is used to describe network deployments that are jointly 
designed for sensing and communication applications [123], in contrast to how 
communication networks and radar systems have been developed and deployed 
independently in the past. The fusing of these technologies became particularly 
interesting when communication systems began to use mmWave bands, which 
is the spectrum range traditionally used for radar [143]. Apart from cost 
savings, a dual-functional network might provide performance benefits to the 
different applications by sharing information between them, and new joint 
radar communication services might arise [144]. 

The integration can come at different levels of which three are illustrated 
in Figure 8.25. At the first level, shown in Figure 8.25(a), the deployment 
sites for communication networks are reused for deploying radar transceivers, 
but the systems are otherwise independent: they use different hardware and 
frequency bands. At the second level, shown in Figure 8.25(b), the same 
transceiver hardware is used for both applications, but they use orthogonal 
signal resources. The network can either switch between sensing and communi- 
cation over time or use non-overlapping frequency bands, which are sufficiently 
similar so the same hardware components can be used for dual purposes. The 
benefit of this approach is that the signal waveforms can be optimized for the 
respective applications without making tradeofts. At the third level, shown in 
Figure 8.25(c), the same time-frequency resources are used for both sensing 
and communication purposes. The benefit of this approach is that more signal 
resources are available for both applications, while the drawback is interference 
and signal transmissions that are not optimized for dual purposes. 

The theory for sensing provided in this chapter directly applies to the first 
two integration levels, while the third level gives rise to different system models. 
A basic mono-static level-three ISAC setup is illustrated in Figure 8.25(c), 
where the base station transmits a communication signal to a data-receiving 
user but also listens to the reflection of the data signal from the target. Since 
the transmitter knows the data signal, it can be used for target detection; we 
recall from Example 8.12 that any signal with a specified average power works 
equally well for that purpose. However, the user prefers data transmission 
with MRT precoding, while the target detection probability is maximized 
if the signal is beamformed towards the potential target location. This is 
an example of the inherent tradeoff between sensing and communication, 
which materializes in conflicting precoding designs in this basic scenario. We 
refer to [145] for a more profound overview of ISAC, also known as joint 
communication and sensing. 
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(a) First integration level: Site-sharing but separate hardware and frequency bands. 
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(c) Third integration level: Hardware- and signal-sharing for sensing/communication. 


Figure 8.25: Example of three integration levels for sensing and communication. Different 
hardware, technology, and spectrum are used in (a), but the site location is shared. The same 
hardware is used in (b), but the time/frequency resources differ. The same hardware and 
resources are used for both applications in (c). 
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8.4 Exercises 


Exercise 8.1. When deriving the Capon spectrum in (8.20), it is implicitly assumed that 
the sample average estimate Ry in (8.4) is an invertible matrix. In this exercise, we will 
analyze the opposite case when Ry, is rank-deficient. By defining Yz = [y[1],...,y[L]] € 
C™MXL we can write R; = YY /L. Let the SVD of Yz be denoted as Y; = UNV". 
Then, the eigendecomposition of Rz can be expressed as Ry = USVIV=S"*U"/L= 
U(z spt /L) U*. 


(a) Show that if L < M, R, is rank-deficient. Hint: You can use the SVD of Yz to 
compute the number of non-zero eigenvalues of Rz. 


(b) Assume L < M so that Yz has L positive singular values. The left singular 
vector matrix can be factorized as U = [U, U], where U € C™*¥ corresponds 
to the strictly positive singular values (in decreasing order) and Decree) 
corresponds to the zero singular values. We can express any array response vector 
as a(y, 0) = Ux + UX in terms of the vectors x = U"a(y, 0) and * = U"a(y, 6). 
Show that the objective function of the Capon spectrum in (8.17) becomes zero 
for a(y, 0) that satisfies x = U"a(y, 0) £ 0. 


(c) According to (b), the Capon spectrum becomes zero when * = U"a(y, 0) 4 0, 
regardless of the value of x = U"a(y,0). A more noise-robust version of the 
Capon spectrum that differentiates between the power of x for different array 
response vectors can be constructed by so-called diagonal loading. In this method, 
a regularization term cIm with a small € > 0 is added to R L to make it invertible, 
and the modified Capon spectrum is obtained as 


1 
P(¢,8) TO ee CU (8.102) 


Assuming L < M and that Yz has the singular values sı >... > sz > 0, express 
the value of the Capon spectrum for an arbitrary a(y, 0) = Ux+Ux = U[x7, x7]" 
in terms of x and x. Does the spectrum value differ for the array response vectors 
that satisfy x = U"a(y, 0) # 0? Hint: Use the relation 


-1 


(Rr + dm) ` = (U (=57/L) U" + «UU") 
=U (S5"/L+ eu) U”. (8.103) 


Exercise 8.2. When generating the MUSIC spectrum in (8.46), we need to create a grid 
of angles and evaluate the value of the spectrum at each grid point. Hence, the accuracy 
of the DOA estimation highly depends on the grid resolution. Although having a dense 
grid for better accuracy is good, one major drawback of the original MUSIC algorithm, 
called spectral MUSIC, is the high computational complexity. A modified version of the 
MUSIC algorithm that avoids the grid search is called root MUSIC [146]. 


, Asin(y) 
(a) Define the complex variable z = e X and express the array response vector 


for the ULA in (8.8) as a function a(z) of z. 


—j27r 


(b) Suppose that there are K sources. Show that the DOA estimates Êx can be 
obtained from the angular positions — 2 Sainte) of the K complex roots, which 
are closest to the unit circle and appear in pairs of reciprocal, of the equation 


a™(z~1)U,U#a(z) = 0. 
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Exercise 8.3. Consider a DOA estimation problem in the 2D plane with the elevation 
angle 0 = 0. There are M = 3 antennas in a triangular array with the antenna positions 
(0, 0), (A, 0), and (0, A). 
(a) If A = å/2, can we unambiguously estimate the DOA for all the angles y € [0, 27)? 
If not, what are the azimuth angles that create ambiguity? 


(b) If A = A/2 — e for some arbitrary 0 < e < A/2, can we unambiguously estimate 
the DOA for all the angles y € [0, 27)? If not, what are the azimuth angles that 
create ambiguity? 


Exercise 8.4. Non-linear least squares (NLS) is a parametric DOA estimation method, 
where the DOA estimates are obtained as the angles that minimize the norm square of 
the difference between the received noisy signals and the noise-free part in (8.3): 


3 


l=1 


yi — Yo Via Pr, Ox) rk all 


where A = [a(yi,01),...,a(yx,9x)] and p[l] = [/Gi21[l],..., VBK zx [l]]. The unknown 
source signals p[l] are treated as deterministic in the NLS method. Assume the rank of 
A equals K. 


>> lyi] - Avil’, (8.104) 


(a) Find the vectors p|l] that minimize (8.104) by expressing the objective function 
as a quadratic function of p[l], for l =1,..., L. 


(b) Insert the optimal p|l] found in (a) into the objective function in (8.104) and show 
that the DOA estimates are found as 


{(n; 9x) }e=1 = arg max yy (JA (ANA) A®y[]). (8.105) 


{en On}, jay 


(c) Show that the NLS method becomes equivalent to conventional beamforming if 
K=1. 


Exercise 8.5. Consider the received signal given in (8.38) for DOA estimation, which 
can be expressed as 


yli] = Ap] + n[J], (8.106) 


where K < M, A = [a(1,1),---,a(~x,9x)] and p[l] = [Vi 21 [I],..., VBK £x [i]. 
(a) Suppose the noise signal is colored with an invertible covariance matrix C, i.e., 
n{l] ~ Nc(0, C). To apply the MUSIC algorithm, we must first whiten the signals 
y[/]. What is the resulting MUSIC spectrum? 


(b) In practice, mutual coupling can occur due to interaction between closely spaced 
antennas in an array. There exist array calibration methods that can mitigate 
these effects, but there will be residual calibration errors. Suppose the received 
signal can be modeled as [147] 


yli] = MApli] + nfl], (8.107) 


where M € C”*™ is a non-singular matrix. If n{l] ~ Mc(0,o°Im), obtain the 
MUSIC spectrum for this signal model. 
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Exercise 8.6. In this exercise, we consider mixed TDOA/DOA localization. A single target 
node is located at (x,y) and M receivers. The 2D coordinates of receiver m is denoted by 
(£m, Ym). Suppose the receivers with the indices 2,...,M, where M < M, provide TDOA 
measurements with respect to the reference receiver 1. The remaining receivers with the 
indices M +1, ..., M provide DOA estimates. Let rrpoa = [r2,1,--- eal and rpoa = 
[raz ages ey rmļ|™ be the noisy distance measurements obtained with TDOA and DOA 
measurements, respectively. The respective measurement noise is denoted by nrpoa = 
[n2,1, easg nazal d Nc(0, Crpoa) and NpDOoOA = eee esa nm]” Ce Nc(0, Cpoa). 


(a) Derive the ML cost function for the mixed TDOA/DOA localization, assuming 
that the measurement noises are independent. 


(b) What is the minimum number of receivers for unambiguous 2D localization under 
the condition 2 < M < M if the azimuth and elevation DOAs can be estimated 
separately? 


(c) What is the minimum number of receivers for unambiguous 3D localization under 
the condition 2 < M < M if the receivers that estimate DOA have ULAs? 


Exercise 8.7. Consider 2D TOA-based localization with a single target node and M 
receivers with the received signals given in (8.52). One approach is to arrange the 
equations in (8.52) to obtain a linear relation with zero-mean additive noise. The LS 
solution can then be obtained in closed form. The aim of this exercise is to obtain a 
relation in the form of b = Az + w, where z consists of unknown variables, b and A 
are fixed, and w has zero-mean noise entries. Given such a model, the LS solution is 
obtained as ĉ = (ATA) ~* ATb if ATA is invertible, and we can obtain (ĉ,ĝ) using ĉ. 


(a) Let us define z = [x,y,27 + y]" € R?. Find the matrix A € R” *? that contains 
only the known receiver locations (£m, ym) and constants. Obtain also the ob- 
servation vector b and noise vector w. Hint: Take the squares on both sides of 


Tm = (atm — £)? + (ym =Y)? +m, m=1,...,M, (8.108) 


where nm ~ N (0, o3). 
(b) Do the entries of w have zero mean? If not, under what conditions can we 
approximate it as a zero-mean vector? 


Exercise 8.8. One of the main contributors to the reduced localization accuracy is NLOS 
paths between the target node and some of the receivers due to the blockage of the 
LOS path. In TOA, the effect of NLOS paths can be modeled as a positive bias to the 
true TOAs with much larger power than the measurement errors. Suppose there is a 
single target node located at (x,y) and M receivers. The 2D coordinates of receiver m 
is denoted by (£m, Ym). Suppose the receivers with the indices 1,..., M, where M < M, 
have a blocked LOS towards the target node, and the NLOS bias is modeled as an 
exponential random variable bm ~ Exp(1/o7) with the PDF from (2.91). For the other 
receivers M + 1,..., M, the relations in (8.52) are valid. The distance measurements for 
this TOA-based localization setup can be expressed as 


Tm =y4/ (Om — £)? + (ym — Y)? +bm, m=1,..., M, (8.109) 
Tm =4/ (Em T)? + (Ym — yY)? +nm, m=M+1,...,M, (8.110) 


where we have omitted the measurement noise nm for m = 1,..., M since the NLOS 
bias is much stronger. Assuming that nm ~ NV (0, o3) and all bm and Nnm are mutually 
independent, derive the cost function to be minimized for ML estimation of (x, y). 
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Exercise 8.9. Consider target detection with the binary hypothesis test 
Ho : yll=nfl], t=1,...,L, (8.111) 
Hı : y]=VPercs+nfl], l=1,...,L, (8.112) 


where the RCS coefficient crcs is constant and known at the receiver (called the Swerling 
0 model). Derive the sufficient statistics for the Neyman-Pearson detector when the 
noise samples n[l] ~ Nc(0,07) are independent. 


Exercise 8.10. Consider the sufficient statistics lyf in (8.85) of the Neyman-Pearson 
detector with the Swerling 1 target model. The SNR of the coherently combined signal 
1¥y under the target existence determines the detection performance of the radar 
detector. 


(a) Let @ denote the average single-antenna SNR of the considered radar channel. 
Under the target existence, express the SNR of the coherently combined signal 17y 
for a given number of transmit/receive antennas M and the number of coherently 
combined symbols L. 


(b) Suppose the average power consumption of the system is 


Long + EM-14+LM-1 W, (8.113) 


where the first term includes a power amplifier efficiency of 25%. The second term 
models that each transmit antenna consumes 1 W and is turned on for L symbols. 
The third term models that each receive antenna consumes 1 W and must be 
active for a fixed window of L symbols to capture the reflected signal. Which 
combination of L and M minimizes the average power consumption in (8.113) 
while guaranteeing an SNR of at least 10 dB for 17y if o = —10 qB, P = 10 W, 
and L = 100? 


Exercise 8.11. Consider a mono-static setup with a single transmit /receive antenna for 
a target detection task. Suppose the propagation between the transmitter/receiver and 
the potential target is modeled using the radar range equation. Moreover, assume that 
a target having RCS of 0dBsm is detectable at a distance of 100m when L = 1. It is 
desired to detect a smaller target with an RCS of —10dBsm at a distance of 200 m. The 
antenna gains are assumed to be fixed in this exercise. 


(a) How many transmit/receive antennas M are needed to achieve the given task 
without changing any other parameters? 


(b) If the target follows the Swerling 1 model, how many symbols L are needed to 
achieve the given task without changing any other parameters? 


Exercise 8.12. Consider a multi-static target detection setup with a single transmitter 
and two spatially separated receivers. Assuming the target reflection follows the Swerling 
1 model, we can express the binary hypothesis test as 


Ho : ywll]=nill], yell] = nell], l= 1,...,L, (8.114) 


Hy è yı [l] = VW Prac + ni], y(t] = w4 P, 2C2 + nell], [= 1, pà .,L, (8.115) 


where cı ~ Ne(0,1) and c2 ~ Nc(0,1) are the random RCSs of the target towards 
the receiver 1 and 2, respectively. Similarly, the subscript in the other symbols refers 
to the receiver index. The noise samples have the distributions ni [I] ~ Nc(0,07) and 
na[l] ~ Nc(0, 07). Derive the sufficient statistics in the Neyman-Pearson detector if all 
the random variables are independent. 
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Exercise 8.13. In the hypothesis test stated in (8.90)-(8.91) for the Swerling 2 target 
model, we have assumed that the transmitter sends the constant symbol “1” throughout 
the L symbol times. What are the sufficient statistics of the Neyman-Pearson detector 
if the transmitter instead sends x = [2[1],...,2[L]]" € C”, which is deterministic and 
known at the receiver? 


Exercise 8.14. Consider target detection setup with the binary hypothesis test 


Ho : yl]=nll], l=1,...,L, (8.116) 
HW : yl]=VPercs[l]+nfl], l=1,...,L, (8.117) 
where the noise is colored such that n = [n[1],...,n[L]]” ~ Nc(0,C) with an invertible 


covariance matrix C. 


(a) Consider the Swerling 1 target model with cros[l] = cres ~ Nc(0,1), for l = 
1,..., L. Derive the sufficient statistics of the Neyman-Pearson detector. Interpret 
the result. 


(b) Consider the Swerling 2 target model with independent crcs{l] ~ Nc(0, 1), for 
l = 1,...,L. Derive the sufficient statistics of the Neyman-Pearson detector. 
Interpret the result. 


Exercise 8.15. Consider a mono-static ISAC transceiver with K transmit and K receive 
antennas. The transceiver sends data to a single user. At the same time, it wants to 
detect the existence of a target at a specific location. The channels from the ISAC 
transceiver to the data user and the target location are denoted as hy € C* and hi € c*, 
respectively. The channel from the target location to the receiver is h, € C*. Suppose 
ps is transmitted where s ~ Nc(0, P) is the data signal and p € C* is the unit-norm 
precoding vector. The target detection is done based on the received signal for one time 
instance at the ISAC receiver, which is reflected by the target. A receive combining 
vector w € C* is applied to the received signal. The RCS variance is orcs. Suppose 
that the receiver noises at the user and ISAC receiver are both zero-mean and have the 
variance o°. 


(a) What is the SNR of the received signal at the ISAC transceiver if the target exists? 
(b) What is the combining vector w that maximizes the sensing SNR in (a)? 


(c) Suppose that P||hu||?/o? = 9 and the user requires an SNR. of 1. How should the 
precoding vector p be selected to maximize the sensing SNR subject to the user 
SNR constraint if hh, = 0? 


Chapter 9 


Reconfigurable Surfaces 


The previous chapters have demonstrated how the signal strength can be 
increased by equipping the transmitter and receiver with multiple antennas 
used for precoding and combining. Unfortunately, the MIMO technology can 
hardly turn a weak channel into a strong one; if the channel gain £ is tiny, 
then MK will remain mediocre. Communication systems that operate under 
NLOS conditions rely heavily on reflections by various surfaces for the signals 
to reach the intended receivers. This might lead to multiple propagation 
paths but often immense signal losses along these paths, particularly in the 
mmWave and THz bands. Figure 9.1 illustrates such a scenario, where the 
NLOS receiver can only be reached by beamforming towards a building that 
reflects the signal. Unfortunately, the building in this example is rotated such 
that the signal is mainly reflected away from the receiver, following the solid 
arrow. Can we change the reflection properties somehow so the signal bends 
around the corner and follows the dashed arrow instead? Yes, this can be 
achieved using reconfigurable surfaces, which is the topic of this chapter. 

This chapter will explore how the reflection properties can be dynamically 
tuned using reconfigurable surfaces to aid the communication between a 
transmitter and receiver. We begin by explaining how reflections can be 
interpreted using the beamforming characteristics from previous chapters and 
how reconfigurable surfaces can control these characteristics. We will then 
analyze how these surfaces can be configured to maximize the capacity of 
narrowband and wideband SISO channels and MIMO channels. 


9.1 Basic Physics of Reflecting Surfaces 


There are two primary categories of reflections: specular and diffuse. These 
categories are illustrated in Figure 9.2 and represent the extremes in how a 
plane wave can interact with a reflecting object. In the specular case, the 
reflected wave remains planar but changes its direction. If angles are measured 
counterclockwise with 0° being the broadside direction, then a wave with 
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Figure 9.1: A downlink scenario where the transmitter can easily reach the LOS receiver. By 
contrast, the NLOS receiver has rather weak channel conditions since the wall reflection directs 
the signal along the solid arrow. If a reconfigurable surface is deployed on that building, the 
signal can be reflected following the dashed arrow instead. 


Incident plane wave Incident plane wave 


(a) Specular reflection. (b) Diffuse reflection/scattering. 


Figure 9.2: A plane wave that reaches an object can be reflected in different ways, with specular 
reflection and diffuse reflection/scattering being the two extremes. 


the incident angle ọ is reflecting having the outgoing angle —y. This is a 
consequence of Snell’s law of refraction, often considered in optics [148]. By 
contrast, in the diffuse scattering case, the reflected wave has a spherical shape 
with no particular directivity; thus, the wave’s energy is further diffused over 
the propagation environments. 

These categories might seem familiar because we constantly observe how 
visible light interacts with objects around us to create specular reflections on 
smooth surfaces (e.g., mirrors that provide an undistorted image) and diffuse 
reflections on rough surfaces (e.g., white walls that spread the light through 
the room). The smoothness level is measured compared to the wavelength 
and size of the object. Firstly, the object must be many wavelengths wide to 
have the chance to provide (approximately) specular reflection. Secondly, the 
surface roughness must be small compared to the wavelength. Hence, a large 
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object that is smooth enough to provide specular reflection for radio waves 
might be too rough to provide that for visible light. On the other hand, a 
perfectly smooth but physically small object might be a specular reflector for 
visible light but be too small to act that way for radio waves since these have 
a roughly 10° times larger wavelength.' In fact, the physics that underpins 
the specular reflection case assumes an infinitely large surface. If a finite-sized 
mirror approximately provides specular reflection for visible light, it must be 
10° times larger to give the same approximation accuracy for radio waves. It is 
not only specular reflection that is an idealization, but ideal diffuse scattering 
that is uniform in all directions (as in Figure 9.2) is also unlikely to occur in 
wireless channels; even a rough object has a specific geometry that affects the 
reflected wave’s shape. 

The properties of the reflected signal from a finite-sized flat object can 
be derived using similar methods as in Chapter 4, where we studied antenna 
arrays. Figure 9.3(a) shows a plane wave impinging on a surface from the angle 
p. We will measure the resulting phase-shifts at three points on the surface, 
which are selected as a ULA with the separation A. If we use the left-most 
point as the phase reference, then the second point observes a phase-shift 
of 2r Asale) and the third point observes a phase-shift of Qn 2Asin(y) These 
phases are obtained from the wave needing to travel the additional distances 
Asin(y) and 2A sin(y) to reach these points. In general, we obtain the relative 
phase-shifts at M points in a ULA configuration with separation A from the 
array response vector in (4.19): 

1 


g Asi 
e7 iam siste) 


a(y)= | © 


x 2A sin(y) 
—j2r Saw 


ec™. (9.1) 


e72" cana sin(y) 

If the same M points retransmit the signal isotropically with the mentioned 
phase-shifts, we obtain the situation illustrated in Figure 9.3(b). The signal is 
beamformed using the precoding vector p = a(y), which can be expressed as 


* 


1 1 
jion A sinfe) «5 Asin(— e) 
e 2a e 2 
i 2A sin(ẹ) . 2A sin(—») 
= jane 1 or Sa" 
p=| € e =a"(—~) (9.2) 
ean AI è j2r DA sao) 


since sin(y) = —sin(—y). We recognize this as the MRT vector (without a 
normalization factor) for transmission in the angular direction —y; thus, this 


1This number is obtained by comparing green light having the carrier frequency 600 THz 
with a wireless communication signal at 6 GHz. 
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(a) Phase-shifts in signal reception. (b) Transmission with the same phases. 
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Figure 9.3: When a plane wave impinges on a plane surface from the direction y, the phase- 
shifts over the surface can be computed by viewing it as a ULA as in (a). If the same points on 
the surface retransmit the signal with the phase-shifts obtained from (a), then a beam will be 
formed in the opposite direction —y as shown in (b). 


is where the retransmitted beam is pointing. This observation is essential to 
determining the shape of the reflected signal from a plane finite-sized surface 
and is an instance of the Huygens-Fresnel principle. This principle says that 
when a wavefront interacts with an object, every point on that object can be 
viewed as a new source that emits spherical waves (isotropically). These waves’ 
constructive/destructive combinations determine the new wavefront [149]. The 
reason that a plane wave that arrives from the angle y is beamformed with 
the angle —y is that the distance to a far-away receiver in that direction is 
identical through all the elements in the ULA shown in Figure 9.3; thus, there 
will be constructive interference in that direction. The reflected wavefront can 
be determined using the methodology for computing beam patterns developed 
in Chapter 4. 


9.1.1 Beam Pattern from a Reflecting Surface 


We will now compute the angular shape of the reflected signal from the two- 
dimensional surface illustrated in Figure 9.4, which is deployed in the yz-plane. 
We denote its horizontal length as Ly and vertical length as Ly to follow the 
notation from the analysis of UPAs in Section 4.5.3. The considered surface is 
a homogeneous perfect electric conductor (PEC), but we will first treat it as 
a UPA by (hypothetically) cutting it into many tiny pieces that each has the 
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Observation angle (po, 0) 


Ly 


Incident plane wave 
from angle (yi, 0:) 


Figure 9.4: A plane wave impinges on a flat homogenous PEC surface in the yz-plane of size 
Ly x Ly meters. The channel gain observed in different observation directions (po, 8o) can be 
computed similarly to the beam pattern of a UPA, which was characterized in Chapter 4. 


physical dimension A x A for some A < å/4. Hence, the horizontal/vertical 
antenna spacing is A. Each antenna has an area A? that is smaller than that 
of an isotropic antenna, implying that it also has an approximately isotropic 
radiation pattern. The number of horizontal and vertical antennas can be 
computed as Ny = Ly/A and Ny = Ly/A, respectively. We will use this 
notation to determine the beam pattern and then let A > 0 so that the UPA 
is made of asymptotically many small antennas that we will call atoms. The 
considered setup is shown in Figure 9.4. 

If a plane wave impinges on the surface from the azimuth angle y; € 
[—1/2,7/2] and elevation angle 6; € [—1/2,7/2], then the relative phase- 
shifts among the atoms are given by the respective entries of the array 
response vector ay,,,ny (Yi, 0i) in (4.128). By following the Huygens-Fresnel 
principle, we can obtain the reflected signal by considering transmission using 
the precoding vector 


p= aNu, Ny (Yi; 0i) = aNn, Ny (Yis —6;), (9.3) 


which corresponds to MRT (without power normalization) in the direction 
(—y;, —0i). The beamforming gain that is observed in an arbitrary observation 
direction, represented by the azimuth angle po € [—7/2,7/2] and elevation 
angle 0, € [—7/2,7/2], is obtained by multiplying with the array response 
vector ay,, ny (Po, 0o) representing the channel in that direction: 


T 2 
B(po, 9) = [a Na, Ny (Pos 8o)Pp| 
T * 2 
= |a Na, Ny (Po: bo)aÑy, Ny (7 Yi» —6;)| x (9.4) 
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A compact formula for this kind of expression was previously derived in 
Section 4.5.3, using a slightly different notation. In our case, (4.139) turns 
into 

sin? (tLy®/X) sin? (wLyQ/A) 


B(%o, 95) = i 9.5 
(Po Be) sin? (tA®/)) sin? (wAQ/X) el 
where the impact of the angles is captured by the variables 
® = sin(po) cos(O,) + sin(y;) cos(0;), (9.6) 
Q = sin(@,) + sin(0;). (9.7) 


Suppose the channel gain from the transmitter to a hypothetical isotropic 
antenna inside the surface is 6,. Each of the atoms has the (effective) area A? 
and will experience the channel gain 6,A?/Ajs. from the transmitter, where 
Aiso = = is the area of an isotropic antenna. This is the fraction of the 
transmitted power that reaches a single atom and will be reflected by it. 
Similarly, suppose the channel gain from the hypothetical isotropic antenna 
to a receiver in the observation direction is 6,. Each atom with area A? will 
then experience the channel gain 6,A?/Aiso, which is the propagation loss 
from an atom to the receiver. In conclusion, the end-to-end channel gain from 
the transmitter to the receiver via the reflecting surface is 


B=6 7 — B(o, 90) 


A4 sin? (tLy®/A) sin? (tLyQ/A) 
Az, sin? (tA®/A) sin? (TAQ/A) 
SBB A4 sin? (nLy®/A) sin? (tLvQ/2) 
O EARS (TA/A)?  (TAQ/A? 
{272 
= bbr N ei ne e (222) sinc? (=) ; (9.8) 


1sO 


= bbr 


where the approximation utilizes the fact that sin(x) ~ x when x ~% 0 and is 
tight when A — 0. The last equality identifies the sinc-function expression. 
The expression in (9.8) shows how the channel gain depends on the angles 
and captures all the essential channel properties, except for polarization. The 
potential polarization mismatch between the transmitter and receiver can be 
included in (6; and (, but are also angle-dependent [150]. 

The largest channel gain is obtained in (9.8) when ® = Q = 0. By 
inspecting (9.6) and (9.7), we can conclude that the maximum is obtained 
for the observation angles yo = —y; and 6, = —6;, which is expected from 
Snell’s law and the previous discussion. If the reflected signal were a plane 
wave, the channel gain would be zero in all other directions, which is not the 
case. Instead, the angular gain variations are the same as for a UPA with the 
same physical size and can be analyzed as in Section 4.5.3. 
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Example 9.1. What are the reflected signal’s first-null horizontal and vertical 
beamwidths when a plane wave impinges from y; = 6; = 0? 

The first nulls appear when the argument of one of the sinc-functions 
in (9.8) is +1. In the horizontal plane (i.e., 9. = 0), this happens for 
Po = tarcsin(A/Ly) ~ A/Ly. Hence, the first-null horizontal beamwidth 
is approximately 2\/Ly. It follows from the same computation that the 
first-null vertical beamwidth is approximately 2\/Ly. The beamwidths are 
proportional to the wavelength, which demonstrates how a PEC surface of a 
given physical size can give an extremely narrow beamwidth for visible light 
but a relatively wide beamwidth for radio spectrum. 

As the surface’s lengths Ly, Ly grow large, for a given wavelength, the 
beamwidths approach zero. This implies the reflected signal will be a plane 
wave with zero beamwidth in the asymptotic limit. This corresponds to ideal 
specular reflection where the incident plane wave only changes direction. 


The maximum gain value in (9.8) can be factorized as 


Lyly LyLy 
B= b Ae © k . 4 (9.9) 
iso iso 
—— ——” 
=Aperture gain for reception =Beamforming gain for retransmission 


Recall that Ly Ly is the total area of the surface. The first term in (9.9) is the 
channel gain from the transmitter to an isotropic-antenna-sized receiver surface, 
while the second term LyLy/Aiso determines how much larger aperture the 
surface has. Hence, we will call the second term the aperture gain, but it could 
also be interpreted as a receive beamforming gain. This part of the expression 
models the fact that a surface collects an amount of power from the impinging 
plane wave that is proportional to its area. The third term in (9.9) is the 
channel gain from an isotropic-antenna-sized transmitter surface to the receiver, 
while the fourth term LyLy/Aiso determines the transmit beamforming gain 
delivered by the surface. This part of the expression highlights how a big 
surface can beamform/reflect the signal with a narrower beamwidth and a 
power concentration in the main beam proportional to its area. 

Figure 9.5 shows the end-to-end channel gain in (9.8) observed in different 
angle directions yə in the azimuth plane (where 0, = 0) when a plane wave 
impinges from the direction y; = 7/6, 6; = 0. We consider a square surface 
with the side lengths L = Ly = Ly € {4A, 16A} and a propagation scenario 
with 6,8, = 1078. The figure shows how the reflected signals are beams 
pointing in the direction yọ = —1/6 = —¥y;, as expected. The horizontal 
beamwidth shrinks slightly when the surface increases in size, but the more 
dominant effect is the increased channel gain, which grows quadratically with 
the surface area. Hence, when each side increases by a factor of 4, the channel 
gain grows by 44 = 24dB. This is the combination of the aperture gain and 
the transmit beamforming gain. 
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Figure 9.5: The end-to-end channel gain in (9.8) from a transmitter in the direction yj = 7/6 
via a reflecting PEC surface to a receiver in a varying observation angle direction yo. The 
channel gain depends on the surface’s size L x L and the observation angle. 


Example 9.2. How does the end-to-end channel gain in (9.8) relate to the 
radar range equation in (8.71)? 

The connection between these expressions can be identified by using the 
notation 6, = Gal pr, OTIT and 6r = Gy (Pr, OdT for the channel 
gains of the LOS paths to and from the radar target, respectively. These 
channel gains depend on the propagation distances and antenna gains at the 
transmitter and receiver. The received power in (8.71) can then be expressed 
as 


(OF 
IP — P.Bt Pr ae : (9.10) 


The corresponding received power when using the reflecting surface is P,8 so 
by comparing (9.10) with (9.8), we can identify the RCS of the surface as 


IC- JLA Ly® IvyQ 
orcs = z V sinc? (=) sinc? (=) ‘ (9.11) 


iso 


This expression characterizes how the RCS depends on the incident and 
observation angles through ® and Q. The largest value is achieved when 
® = Q = 0, but we can also achieve zero RCS if ® = +A/Ly or Q = +A/Ly. 
If the surface would be rotated randomly with respect to the transmitter 
and receiver, then ® and 2 are random, and so is the angle-dependent RCS 
orcs. This is the core principle that leads to randomness in radar sensing. 
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Figure 9.6: When the plane wave J Pel? fe(t— ©) reaches the boundary between two mediums, 
a fraction of it is reflected backward and the remaining is transmitted into the new medium. 
The reflection coefficient 9; determines these behaviors. 


9.1.2 Reconfigurable Reflection from Heterogeneous Surfaces 


The previous section analyzed the reflection from a homogeneous surface, 
which reflects all the power of the incident wave as a beam pointing in the 
specular reflection direction. The situation is different when the surface is 
heterogeneous. We need the reflection coefficient to study that scenario. 

When a sinusoidal wave reaches the boundary between two mediums, their 
respective characteristic impedances determine what fraction of the signal is 
reflected back versus transmitted into the new medium. We let Zo denote the 
impedance of the first medium (e.g., free space) and Z, denote the impedance 
of the second medium (e.g., the surface). The reflection coefficient can then 
be computed as [151, Sec. 1.7] 


Zı — Zo 


ina. 
a ae 


(9.12) 
and is the reflected signal divided by the incident signal at the boundary 
between the mediums. The reflection coefficient can be complex, in which 
case arg(I‘91) represents the phase-shift incurred to the signal before it is 
reflected. This scenario is illustrated in Figure 9.6, where the first medium is 
free space (vacuum) for which the speed of light has been denoted c earlier 
in this book. A fraction |[;|? of the power is reflected, while the remaining 
fraction 1— |[1|? is transmitted into the new medium and might be absorbed 
by it. We will focus on the reflected signal. 

A homogenous surface has a constant impedance Z1, which results in a 
reduced power by a factor of |['9;|? € [0,1], but otherwise, the same reflection 
behavior as in the previous section. However, suppose the surface is divided into 
N small units that are structurally similar but have heterogeneous electrical 
properties. We call these metaatoms and each has a specific impedance Z,, for 
n = 1,..., N. The corresponding reflection coefficients then become 


Zn — Zo 


Tua a 
j Zn + Zo 


(9.13) 
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The characteristic impedance of free space is Zo © 376.7 1207 ohm, so it is 
real-valued. Suppose we design the metaatom using a reactance element with 
a purely imaginary impedance Z, = jX,, for some X, € R. It then follows 


that 
Pon| = (xn Ze| E _ | 9.14 
0 X Z 5 5 ; 
J n + 0 VX t ZG 


T — 2arctan (4) : if Xa > 0, 
—7 — 2arctan (42) , if X, <0. 


(9.15) 


jXn = 2) 


arg(Ton) = arg | = 
slon) (e 


Z 
Such a metaatom will reflect all the incident power since |[o,| = 1 and it 
causes a phase-shift w,, = arg(Ton) that can be continuously tuned between 
—r and 7m by varying X,. Such tuning can be achieved by configuring a 
capacitor that determines the capacitive part of the reactance, which can be 
implemented using a varactor diode.” This feature is the key to designing 
reconfigurable surfaces that can shape the reflected signals. 

We will now return to the reflection example in Figure 9.4 that analyzed a 
homogeneous PEC surface. Suppose that surface is replaced by the one shown 
in Figure 9.7, which consists of N = NyNy metaatoms that have varying 
impedance values corresponding to phase-shifts between -r and a. Each 
color represents a specific phase value, and we let Yn € [—7,7) denote the 
phase-shift incurred by the nth metaatom. If the incident plane wave arrives 
from the angular direction (yi, 0i), then the incident phase-shifts over the 
surface are given by the array response vector ayy, (Yi, i). Each metaatom 
then adjusts its local incident phase value by Yp; therefore, the phase profile 
of the retransmitted/reflected signals is given by the precoding vector 


P = Dyany,ny (Yi; %) = Dyan, ny (—¥i, —%), (9.16) 
where the surface’s phase adjustments are applied using the diagonal reflection 


matrix 


Dy = diag (e%,...,2% ). (9.17) 
The beamforming gain expression in (9.4) can then be updated as 
T 2 
Bo, 8o) = [a Na, Ny (Pos 9.)p| 


= [ane Ny (Pos 9.)D pang Ny (79i; ~6;)|° . (9.18) 
Since each entry of an array response vector is a complex exponential entirely 
determined by a phase value, we can turn p into any array response vector 
of our choice by selecting Dy, accordingly. We can thereby control the main 
direction of the reflected beam. We can also generate precoding vectors that 
are not array response vectors if we happen to prefer that. 


2A capacitor adds a positive value to Xn and an inductor adds a negative value. If the 
metaatom is a circuit consisting of both fixed inductive elements and variable capacitive elements, 
then we can control Xn over a range of both positive and negative values, resulting in the range 
of positive and negative phase-shifts shown in Figure 9.17. 
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Figure 9.7: A plane wave impinges on a flat surface in the yz-plane that consists of Ny x Ny 
reflecting metaatoms with heterogeneous properties. The metaatoms’ varying impedances cause 
different phase-shifts between —r and 7, as exemplified using colors. This enables the surface to 
control the channel gain in the observation directions (Yo, 90). 


Example 9.3. How should the surface’s phase-shifts be selected to point the 
reflected beam in a specific desired direction (ya, 0a)? 

The reflected beam will point in that direction if p = aj, jy, (Ya; 9a). By 
equating (9.16) to this value, we obtain the relation 


Dy aNu, Ny (Yis —6;) = ANy,Ny (Yas ba) : (9.19) 
SERAN OEE 
eea NE = [Gayest da NiE 


The nth entry can be expressed as ena: n = Gan, Which holds if Yn = 
arg(ad,n/ai,n). Hence, each metaatom compensates for the phase difference 
between the desired array response and the actual array response of the 
incident wave. Since the phase varies gradually in both vectors, the phase 
profile of the surface will also vary gradually, as exemplified in Figure 9.7. 


Another way to control the direction of the reflected beam would be to 
rotate the surface mechanically, but greater flexibility is achieved by the 
electrical implementation described above. A similar discussion was made 
in relation to Example 4.22, which compared the mechanical and electrical 
downtilt of an antenna array. When considering reflecting surfaces, we seek a 
way to deploy them on building facades, as illustrated in Figure 9.1, to point 
the reflection angle toward the receiving user without mechanical rotations. 
Using the radar sensing terminology from Section 8.3, we want to configure 
the electrical properties of the surface to achieve the largest possible RCS in 
the direction leading to the receiver. 
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Figure 9.8: The end-to-end channel gain in (9.8) from a transmitter in the direction yj = 7/6 
via a reflecting surface of size L x L to a receiver in a varying observation angle direction Yo. 
The smaller the reflector is, the more circular/isotropic its radiation pattern is. 


We use the term “metaatom” when referring to each controllable piece of 
the surface to signify that they are tiny compared to the wavelength. This 
is because a small object provides approximately diffuse reflection with no 
preferred directivity, even if it is flat. Figure 9.8 shows the end-to-end channel 
gain in (9.8) observed in different azimuth angle directions po when a plane 
wave impinges from the direction y; = 7/6, 6; = 0. The setup is the same as 
in Figure 9.5, but now the surface dimensions are L x L with L € {A/6, \/2}. 
Both sizes result in a reflected signal that is spread over all angles, even back 
toward the transmitter, but the radiation pattern becomes more circular (i.e., 
closer to isotropic) as the size shrinks. For the reconfigurable surface to fully 
steer the direction of the reflected signal, it should be made of many tiny 
controllable pieces that each lack a preferable directivity but can be used to 
jointly beamform the reflected signal where we want it to go. 


9.1.3 Terminology and Implementation Aspects 


Reconfigurable surfaces are often associated with metamaterials, which are 
engineered materials typically containing sub-wavelength-sized structures that 
create a heterogeneous impedance profile over the surface. These structures 
are typically referred to as metaatoms, which is why we have already adopted 
that terminology in this chapter. The engineered material concept was first 
utilized in communications to design static reflectarrays with a fixed reflection 
matrix Dy that was not a scaled identity matrix. This results in an anomalous 
reflection angle that differs from Snell’s law [152]. This was followed by 
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Varactor diodes 


Figure 9.9: The photo shows the reconfigurable surface prototype from [159], which is designed 
for the 5.8 GHz band. It consists of N = 1100 metaatoms, arranged as a UPA with Ny = 20 
rows and Ny = 55 metaatoms per row. The impedance of each metaatom is controlled using 
two varactor diodes, which enable phase control of the reflected signals over a range of 240°. 


reconfigurable reflectarrays that can electrically tune the reflection matrix 
[153]. The purpose was to build transmitters/receivers consisting of a single 
antenna pointing towards the reflectarray that controls the beam direction. 
This is an alternative implementation of the analog beamforming architecture, 
discussed in Section 7.3.1, that is particularly used for satellite communications 
and radars but has also been commercialized for mmWave transceivers. 

The alternative concept of deploying reconfigurable surfaces in the propa- 
gation environment to relay signals between a transmitter and receiver gained 
traction in the late 2010s. This concept has been called software-controlled 
metasurfaces [154], reconfigurable intelligent surfaces (RIS) [155], intelligent 
reflecting surfaces (IRS) [156], and reconfigurable intelligent metasurfaces [157]. 
In this book, we will call them reconfigurable surfaces. There is a wealth of 
implementation challenges and details that go beyond the purpose of this book. 
We refer to [158] for a review of software-controlled metasurfaces designed 
for everything from the low-band to the infrared frequency range. While it 
is possible to build reconfigurable surfaces that can vary the phase-shifts 
continuously using varactor diodes, many designs use PIN diodes that can be 
switched on and off to shift between a discrete set of phase values. 

A reconfigurable surface prototype from [159] with N = 1100 metaatoms is 
shown in Figure 9.9. Each metaatom contains two metallic patches connected 
to varactor diodes, which are controlled by an external bias voltage to tune 
the impedance. This enables the prototype to select phases Yn € [—120°, 120°] 
and the corresponding power loss varies slightly with the phase but satisfies 
1 — |Lon|? < —3dB. The indoor and outdoor measurements presented in 
[159] verify that the prototype can change the angle of the reflected beam as 
described above. In conclusion, reconfigurable surfaces can be implemented, 
and the remainder of this chapter will analyze how they can aid communication 
and radar systems. 
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9.2 Narrowband Communication using Reconfigurable Surfaces 


We will now analyze how a reconfigurable surface can be tuned to aid a point- 
to-point communication system. We begin by considering a narrowband SISO 
channel between a single-antenna transmitter and a single-antenna receiver. 
The received signal was stated in (2.144) as 


Y= heen, (9.20) 


where h is the channel coefficient, x ~ Nc(0, q) is the capacity-achieving trans- 
mit signal, and n ~ Nc(0, No) is independent noise. According to Corollary 2.1, 
the capacity of such a channel can be expressed as 


q\h|? 
No 


C = logs (1 + ) bit /symbol. (9.21) 
When a reconfigurable surface is deployed in the propagation environment, it 
affects how the channel coefficient h is modeled. Suppose the surface consists 
of N metaatoms that reflect all the incident power with the controllable 
phase-shifts Y, for n = 1,..., N. This setup is illustrated in Figure 9.10. In 
general, the end-to-end channel can be modeled as 


N 
h = hs + > ewe hs n, (9.22) 


n=1 


where the static channel hs € C includes all propagation paths unaffected by 
the surface. The propagation path via metaatom n is described by the channel 
coefficient hy, € C from the transmitter to the metaatom, the phase-shift 
ei’ incurred by the metaatom, and the channel coefficient Ren E€ C from 
the metaatom to the receiver. These three coefficients are multiplied together 
following Section 9.1.1. Since waves that travel through different paths are 
superimposed at the receive antenna, the channel coefficients are added up as 
in (9.22). We can express (9.22) in the matrix/vector form 


h = hy +h? Dyhy (9.23) 
by introducing the notation 
hea hr 
hy = : , h, = : ; (9.24) 
hi, N hr, N 
and recalling the reflection matrix notation Dy = diag (e”,...,e”) from 


(9.17). Using this notation, the capacity in (9.21) for a given value of Dy 


becomes g 
hs +h? Dyh 
logs (1 mle ie (9.25) 
No 
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Figure 9.10: An example of a narrowband SISO channel aided by a reconfigurable surface with 
N metaatoms. The SIMO channel from the single-antenna transmitter to the surface is denoted 
by ht and the MISO channel from the surface to the single-antenna receiver is denoted by hy. 
Each metaatom incurs a tunable phase-shift Y» and the static channel that does not involve 
the surface is denoted as hs. The end-to-end channel coefficient becomes h = hs + hF Dy ht, 
where Dy = diag(e”1,..., ei”) is the reflection matrix. 


We can aid the system by identifying the matrix that maximizes this capacity 
expression. In particular, we want to maximize 


hel H Nhs]ei 28s) 
[hr hs] [Mr he peilers he) +04) 
Ihs + h7 Dyh;|? = 7 | 


Vihr, nhen] VIn yhe y dn he) eN) 


[he] i Tinei areth) 2 


Mrih] [hr ih Je rehe) +Y) 


IA 


hr Nhi N] [Mhr yhe y| dEn hn) HoN) 


N 2 
= [hs] + 5 [Ar nht,nl (9.26) 


n=l 


where the second row follows from the Cauchy-Schwartz inequality in (2.18). 
The upper bound in that inequality is achieved if and only if the two vectors 
are equal, except for a scaling factor. The first entries of the two vectors 
differ by the phase-shift e/"8("s), which is determined by the static channel 
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coefficient and cannot be changed. The other entries differ by a phase-shift that 
the surface can control; thus, we can attain the upper bound by matching their 
phases to the first entry. In particular, the entry corresponding to metaatom n 
must be configured such that 


eare(trnhtn) ein — earst) > pp = arg(hs) — arg (hyn hen) + 2Tkn 
(9.27) 


for the integer kn that gives that Yn € [—7,7). This solution is also obtained 
as Yn = arg(hs/(hrnftn)) if hs A 0 and hr nhin # 0. We notice that 
the capacity-maximizing configuration removes the phase-shift arg(hy hn) 
created by the channels to and from the metaatom and replaces it with the 
phase-shift arg(h,) of the static channel. In this way, the signals that the N 
metaatoms reflect reach the receiver with a phase that matches the signal that 
propagates through the static channel. We have proved the following result. 


Corollary 9.1. Consider a discrete memoryless SISO channel aided by a 
reconfigurable surface, for which the channel coefficient is 


N 
jo = fig SO Rene! hep: (9.28) 


n=l 


The channel capacity is maximized by configuring the surface as Yn = 
arg(hs)—arg(hr nht n) +27kn, where kpn is the integer that gives Yn E€ [—7, 7), 
for n= 1,..., N. This results in the capacity 


a (\fel +08 Mreinttel) 


Cell l4 
082 No 


bit /symbol. (9.29) 


The maximum end-to-end channel gain is the squared sum of the amplitudes 
|hs| and [hr nhtn| for n= 1,..., N. If we define the effective channel vector 


hs 
` hr ther 
h = . : (9.30) 
hy Nhs, N 
we can alternatively express the end-to-end channel gain using the 1-norm® as 


2 


N 
(Ima + Z atnl) = |[al)?. (9.31) 


n=l 


3The 1-norm is defined for an arbitrary vector x € C™ as ||x||1 = ar [Em]. It is also 
known as the Manhattan norm since it adds up the distances in the M dimensions as if one has 
to travel along straight perpendicular streets on a map. 
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Example 9.4. Suppose the N reflected paths have the same propagation 
losses: |hs, n|? = b; and |hr n|? = Br for n = 1,..., N. How does the end-to-end 
channel gain behave when the channel gain |h,|? = 6, of the static path is 
either relatively weak or strong? 

The contribution from the metaatoms to (9.29) is XA |hrnhtn| = 
N\/B,3 under these assumptions. Hence, the end-to-end channel gain becomes 


= N22. Bey Bat l, 
(VE VER) = (oR (9 


where “small” means that 6, < N?,6, and “large” means that 8, >> N? b,b- 
In the former case, when the vast majority of the received power comes from 
the surface, the end-to-end channel gain is proportional to N?G,@,. This term 
grows quadratically with the number of metaatoms, thanks to an aperture 
gain of N and a transmit beamforming gain of N. When the static path is 
relatively strong (i.e., Bs >> N? brb), the reconfigurable surface barely affects 
the end-to-end channel gain, which is approximately equal to 6s. 

The conclusion is that physically large reconfigurable surfaces are much 
more effective than small surfaces, thanks to the quadratic scaling law. This 
is important because the N? factor is multiplied by 3,;, which is the product 
of two channel gains that both can be very small numbers. 


We will now use the exact expression on the left-hand side of (9.32) 
to demonstrate how the number of metaatoms affects communication per- 
formance. Figure 9.11 shows the capacity as a function of the number of 
metaatoms when 6, = —80dB, 3, = —60dB, and g/No = 100 dB. We com- 
pare two types of static paths: 8, = —80dB (strong) and 8; = —110dB (weak). 
When the static path is weak, the capacity is nearly zero for N = 0 but grows 
rapidly as metaatoms are added to the surface. In this case, the N? SNR 
growth from (9.32) dominates. By contrast, when the static path is strong, 
the capacity is already quite high for N = 0, and a huge surface is needed 
before it has a noticeable impact on the capacity. The relative strength of the 
propagation path provided by the surface matters, not how much power it 
provides in an absolute sense. This indicates that reconfigurable surfaces are 
particularly valuable in deployment scenarios with weak static paths, where 
even a small surface makes a significant difference. 

While Section 9.1 considered reflections in LOS scenarios, we have not 
assumed any specific channel model in this section. By optimizing the re- 
flection matrix, we can find the phase-shift profile of a surface with fixed 
dimensions that maximizes the received signal power in a given propagation 
environment. We could mechanically bend and deform a homogeneous surface 
to obtain the corresponding physical shape, but instead, a reconfigurable 
surface synthesizes that shape using a heterogeneous impedance pattern. This 
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Figure 9.11: The capacity of a SISO channel that is aided by a reconfigurable surface. The 
capacity increases with the number of metaatoms, and the relative improvement is particularly 
large when the channel gain 8s of the static path is weak. 
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(a) Reflection from a bent surface. (b) Reflection from a reconfigurable surface. 


Figure 9.12: The shape and electric properties of the reflecting surface jointly determine the 
reflection direction. In (a), a homogenous surface is bent to reflect the incident wave toward the 
receiver. In (b), a flat reconfigurable surface has a phase-shift profile that achieves the same 
result by adding extra phase-shifts to signals in the center compared to the edges. 


enables us to reconfigure the reflection properties rapidly when the environ- 
ment or transmitter/receiver locations change. Figure 9.12 illustrates this 
principle. When the wave reaches the surface from the right, a parabolically 
bent surface will reflect the signal toward the indicated receiver, as shown in 
Figure 9.12(a). A flat, reconfigurable surface can synthesize the same reflection 
behavior by adding extra phase-shifts to the wave components reflected at 
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its center compared to its edges. This is illustrated in Figure 9.12(b), where 
the coloring behind the surface represents the phase-shifts using the same 
scale as in Figure 9.7. The phase-shifts are negative in this scenario since they 
represent extra delays incurred by the metaatoms to synthesize a bent surface. 


9.2.1 Line-of-Sight Channel Modeling and Surface Placement 


The channel gain can be computed precisely in free-space LOS propagation 
using the formulas provided in Section 1.1.4. Suppose the distance from the 
transmitter to the surface is d, and the distance from the surface to the 
receiver is d,. If the transmitter has the antenna gain G;(y;, 0+) towards the 
surface and each small metaatom has the area Am (i.e., the antenna gain is 
$5 Am), it follows from (1.40) that the channel gain between them is 
X? 4T Gi (ve, Ot) Am 
By = (4nd,)2 Ge (Ye, At) 2 Am 4nd . 
Similarly, if the receiver has the antenna gain G,(y,,6,) towards the surface, 
then the channel gain between them is 


An 4T Gr(Yr, Or) Am 


=^ 6.)— Am = 
br (4rd)? Gr (Pr, r) y2 And? 


When the propagation path via the surface dominates over the static path, it 
follows from (9.32) that the end-to-end channel gain can be expressed as 


Gil pr, 6:)Am Gr (Pr, 0r) Am = N? Gilys, 4,)Gr (Pr, 6,) A2 
And? Ard? (4rdidr)? 


where we utilized the gain expressions in (9.33) and (9.34). There are many 
squares in (9.35) because the end-to-end channel gain is the product of two 
conventional channel gain expressions. First, the metaatom’s area is squared 
because it appears in both gain expressions. It is also multiplied by N?, which 
implies that it is the total area NA, of the surface that determines the 
end-to-end gain. Second, the squared propagation distances d? and d? appear 
in the expression since the signal power attenuates inversely proportional to 
them in free-space propagation. The distances are also multiplied together. 

In NLOS propagation scenarios, any channel model could be used for the 
individual channels. The only important aspect is to account for the small 
area A,, of each metaatom, which results in a gain ST Am. This value is smaller 
than one since we consider sufficiently small-sized metaatoms to be able to 
capture and retransmit power almost isotropically. 

It is not only the number of metaatoms that determines the end-to-end 
channel gain but also where the reconfigurable surface is deployed. Ideally, 
the transmitter and receiver should have LOS channels to the surface because 
these are generally stronger than NLOS channels. When multiple deployment 
locations satisfy that condition, further characteristics can be considered. The 
following example highlights one key property. 


(9.33) 


(9.34) 


N? BB, =N? , (9.35) 
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Example 9.5. Suppose a transmitter and a receiver with isotropic antennas 
are located in the same horizontal plane. They are both 10m from a long wall 
but 100m apart from each other. There is no static path between them, but 
they can see any point on the wall. Where on the wall should a reconfigurable 
surface be deployed to maximize the end-to-end channel gain? 

We let N denote the number of metaatoms, A,, denote the area per 
metaatom, and ,/10? + d2, be the distance between the transmitter and the 
surface, where dw € [0,100] m is the distance along the wall. The distance 
between the surface and the receiver is then given as \/10? + (100 — dy)?. 
Since we have LOS channels and isotropic transmit and receive antennas, the 
end-to-end channel gain can be expressed using (9.35) as 


Am An 
(102 + (100 — dw)?) eee 


2 = 2 
N Beh = Na (9.36) 


The deployment location affects the term (10? + (100 — d,,)?)(10? + d2) in 
the denominator. This term has the first-order derivative 
o 


a (10? + (100 — dy,)) (107 + d2) = 4d’, — 600d%, + 20400d,, — 20000 


= 4(dw — 50) (df, — 100dy + 100) , (9.37) 


which has the roots dy = 50m, dy = 50—20/6 ~ 1m, and dy = 50+ 20V6 ~ 
99m. The former value is a maximum and the latter values are two minima, 
as can be proved by checking the signs of the second-order derivative. Hence, 
the channel gain is minimized when the surface is deployed in the middle and 
maximized when it is close to the transmitter or receiver. 


The conclusion from this example is that we should look for all deployment 
locations where the surface has LOS conditions to both the base station and 
prospective user locations. Among these locations, we should pick the one 
that is closest to either of them since this maximizes the channel gain. This 
insight motivates the holographic MIMO architecture, mentioned briefly in 
Section 7.4.2, where a metasurface is deployed as a part of the base station to 
create an analog beamforming architecture with small antenna spacing. In 
this chapter, the reconfigurable surface is meant to be decoupled from the 
transmitter and receiver, but it should preferably be quite near one of them. 

Figure 9.13 shows the end-to-end channel gain in (9.36) for N = 200 
metaatoms that each has the area Am = (A/4)? where \ = 0.1m is the 
wavelength. As expected from the example above, the maximum channel gain 
is achieved close to either the transmitter or receiver, while the minimum 
value is obtained in the middle. The difference is around 8 dB in this example, 
which is substantial but not huge compared to the fact that all the considered 
channel gains are at the order of —100 dB. 
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Figure 9.13: The end-to-end channel gain in (9.36) depending on how far the reconfigurable 
surface is from the transmitter, when the total distance between the transmitter and receiver is 
100m. The stars show the maximum and minimum values, which were derived in Example 9.5. 


9.2.2 Acquiring Channel State Information and Feedback Signaling 


The reconfigurable surface requires CSI to compute the capacity-maximizing 
reflection matrix. Specifically, (9.27) shows that it must know the phase of 
the static channel arg(hs) and the phases arg(hr nht,n) of the paths through 
each of the N metaatoms. It is sufficient to know the phase of the cascaded 
channel coefficient hr nht n, while the individual characteristics of hy,» and 
htm are unimportant. These N + 1 real-valued phase coefficients can be 
estimated by sending a pilot sequence similar to the point-to-point scenario 
described in Section 4.2.4. The precise details are somewhat different since the 
reconfigurable surface can only reflect signals, not measure them. To describe 
the procedure, we begin by factorizing the end-to-end channel in (9.22) as 


1] f hs 
h=he+ 3 hr nerh n = 7 a l (9.38) 
= ey | Lahti 
apt a 


where # € CH! contains all the configurable phase-shifts (including a 1 for 
the static path) and h € C+! contains all the necessary channel coefficients. 
This vector was previously defined in (9.30). Every time a signal is transmitted 
over the channel, it will experience the scalar channel coefficient h obtained 
as the inner product between the channel vector h and the complex conjugate 
of the phase-shift vector yw. Hence, only the one-dimensional part of the 
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(N +1)-dimensional channel vector that is aligned with a" is utilized to reflect 
the signal toward the receiver, while the remaining dimensions are invisible to 
the receiver. For the receiver to observe the entire channel vector, we must 
send multiple pilot signals and vary the phase-shift configuration vector to 
explore all the dimensions of C+! where h might have a component. 

By following the notation from Section 4.2.4, we consider the transmission 
of a preamble of length Lp designed to enable channel estimation. Specifically, a 


constant pilot sequence z|l] = \/q is transmitted for l = 1,..., Lp and we reflect 
it using a sequence of different configuration vectors: ¢[1],..., [Zp] € CAH. 
The received signal at time instance l can then be expressed as 
yll =o [hyqtnfi], 1=1,...,Lp, (9.39) 
but it is convenient to write it in matrix/vector form as 
y{1] n[1] 
: [=W ... vE iva] i (9.40) 
—_ SS 
y[Lp] =v n[Lp] 
=ý =n 


If the channel vector h is treated as deterministic but unknown, the PDF of 
ý € C% can be expressed using (2.80) as 
1 _ liy-whyall? 
No 


KO) = aera” 


ar (9.41) 


because y — wh/7 ~ Nc(0, NoIz,). The ML estimate h of h is the vector 
that maximizes the PDF, which corresponds to minimizing the squared norm 
expression in its exponent. By equating the argument of the norm to zero, we 
obtain 7 7 i 

{v >% a -l» 

y — Vh\//q=0 h a y (9.42) 
if the matrix W € C4»*(N+ is invertible. This requires that Lp = N +1 since 
only square matrices are invertible. Moreover, we need to find an invertible 
matrix of that size that satisfies two conditions: all entries of the first column 
are equal to 1, and all other entries are complex exponentials that can be 
implemented using the phase-shifting ability of a reconfigurable surface. Both 
properties are satisfied by the DFT matrix in (2.198) if it is scaled properly: 


W=VJVN+1F Nit. (9.43) 
For this particular choice, the ML estimate in (9.42) can be expressed as 
Ce eee y 1 
h= —WV  (Wh/qt+n) =h + —— Fyn, 9.44 
gt (whvi+n) ovens (044) 
which is the true channel vector plus a scaled noise term with i.i.d. entries 


distributed as Nc(0, No/(a(N +1))). The estimation error vanishes as q —> 00 
as expected from a well-crafted estimator. 
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Example 9.6. Can an ML estimate of h be computed if Lp #N+1? 

Yes, we can always do our best to maximize the PDF in (9.41) even if the 
performance varies. If Lp > N +1, we can pick W as a full-rank matrix. We 
still want to solve the linear system of equations y — Yiq = 0 with respect 
to h. This system is overdetermined and might lack a solution. However, the 
channel vector can only be observed in the subspace spanned by the columns 
of W; thus, we can project the equation to that subspace and then solve it: 


1 


wy "yh g=0 > h= 
y va A 


TAT 
pp) wy = h+ —(P”p) tpn. 
( ) y T ) 

(9.45) 


The matrix (W"W)~!W" is called the left pseudo-inverse of Y. The estimate 
is more precise than with Lp = N + 1 since it builds on more observations. 

If Lp < N +1, the linear system of equations y — Àq = 0 is underde- 
termined and has many solutions. Instead of picking an arbitrary solution, 
which leads to estimation errors that remain as q —> oo, it can be desirable 
to reformulate the entire problem to reduce the number of unknowns. We 
can group N, adjacent metaatoms together into a subarray that must use the 
same phase-shift value, motivated by the fact that the optimal phase-shift 
pattern often varies slowly over the surface. A single channel coefficient can 
then represent the cascaded channel through one subarray. Hence, in the 
reformulated estimation problem, detailed in [160], there are only N/N, +1 
unknown coefficients. For any Lp > 2, we can pick the subarray size Ns such 
that Lp > N/N, + 1 to avoid an underdetermined estimation problem. 


The ML estimate is computed at the receiver, not the reconfigurable 
surface that needs it. A possible solution is that the receiver (e.g., the base 
station) computes the estimate, then determines the desirable configuration 
by putting the estimates into (9.27), and finally feeds this information back 
to the surface. This procedure is illustrated in Figure 9.14. The feedback link 
requires the reconfigurable surface to be equipped with a transceiver. 

We will now compare the capacity-maximizing configuration (based on 
perfect CSI) with the capacities obtained when the reconfigurable surface is 
tuned based on the ML estimate (imperfect CSI) and when random phase- 
shifts uniformly distributed in [—7, 7) are used. Since randomness affects h in 
the latter cases, we present the average capacity values in Figure 9.15. These 
values are computed assuming the receiver knows h perfectly during data 
transmission, so the randomness only affects how the surface is configured. We 
consider N = 200 metaatoms, 5, = —80dB, 6, = —60 dB, and 6, = —110dB 
(as in Figure 9.11). The SNR shown on the horizontal axis is defined based 
on the static path as q3,/No. The estimation accuracy is low when the SNR 
is small; thus, random phase-shifts give the same capacity as when the ML 
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Figure 9.14: A reconfigurable surface can be configured by letting the transmitter repeat a 
pilot transmission while the surface reflects it using different predefined configurations. The 
receiver then computes an estimate of the channel vector and uses it to compute the desirable 
configuration. This information is then sent to the surface using a feedback link. 
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Figure 9.15: The capacity as a function of the SNR, considering a SISO channel aided by a 
reconfigurable surface with N = 200 metaatoms. Three different configurations are compared: 


the capacity-maximizing one based on perfect CSI, one based on the ML estimator (imperfect 
CSI), and one using random phase-shifts. 
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estimate is used for selecting the phases. As the SNR increases, the imperfect 
CSI curve improves rapidly, thanks to the better estimation accuracy, and 
converges to the perfect CSI curve. The gap to the case with random phase- 
shifts is then large. It is important to note that the SNR per metaatom, 
q6r+/No, is 30dB smaller than what is shown on the horizontal axis; on 
the other hand, the 201-length pilot sequence increases the SNR during the 
channel estimation by 10 log,)(201) ~ 23dB. In conclusion, assessing what 
SNR value is small versus large in this context is complicated. 

The ML estimator derived and discussed above is non-parametric, which 
means that we look for any conceivable channel vector in C’+!. When the ML 
estimation framework was previously applied in Section 4.2.5, we restricted 
the search to LOS channels that are parametrized by the angle-of-arrival to a 
multi-antenna receiver. A similar parametric ML estimator can be developed 
when the channels to and from the surface are array response vectors, but we 
refer to [161] for the precise details. 


9.3 Wideband Communication using Reconfigurable Surfaces 


In this section, we will analyze how reconfigurable surfaces can be utilized 
to enhance communication over wideband channels. To ensure we capture 
the essential new characteristics, we must revisit how practical continuous 
passband channels were transformed into discrete complex baseband chan- 
nels in Section 2.3. The previous analysis considered the general setup in 
Figure 9.16(a), where a passband signal zp(t) is transmitted over a wireless 
channel and vp(t) denotes the filtered version that reaches the receiver before 
noise is added to it. The channel was described by the impulse response g,(t), 
and it depends on the propagation environment that the reconfigurable surface 
can control. Hence, we will now denote the impulse response as gp; (t), where 
the vector p represents the surface configuration. 

The end-to-end channel impulse response gp.y,(t) is the sum of the impulse 
responses of the different propagation paths. We begin by defining the impulse 
response gs p(t) of the static LTI channel that the signal propagates over in 
the absence of the reconfigurable surface. The transmitted signal z,(t) also 
propagates to each of the N metaatoms in the reconfigurable surface through 
a separate LTI channel represented by an arbitrary impulse response g¢,n,p(t), 
for n = 1,..., N. When the signal reaches metaatom n, it will be filtered by 
its internal circuitry and then reradiated. The filtering happens in the analog 
domain and will be modeled as an LTI filter. We denote the impulse response 
as Un pp, (t) and stress that it is reconfigurable in the sense that it depends 
on an external stimulus represented by the variable Y, from the vector w. 
In other words, we can choose from a set of possible impulse responses by 
selecting Yp. To be consistent with the LTI assumption, only one value of Yp 
can be used during the considered signal transmission. 
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(a) The general relation between the transmitted and received passband signals. 


Static channel 
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(b) A detailed channel description when utilizing a reconfigurable surface. 


Figure 9.16: The received passband signal up(t) is the convolution between the transmitted 
signal zp(t) and the channel impulse response gp(t), as shown in (a). When utilizing a reconfig- 
urable surface with N metaatoms, the impulse response is the superposition/addition of the 
impulse response gs,p(t) of the static channel and the N controllable impulse responses of the 
channels via each of the N metaatoms, as shown in (b). 


When the signal is reradiated from metaatom n, it propagates to the 
receiver over yet another LTI channel with an arbitrary impulse response 
grn p(t). Since the transmitted signal propagates via metaatom n to the 
receiver through a cascade of three LTI filters, the joint impulse response is 
the convolution of their impulse responses: (grn,p * Un pin * Jt,n,p)(t). We 
thereby obtain the input-output relation illustrated in Figure 9.16(b): 


N 
Up(t) = (9s,p * Zp) (t) + Y Geag * On pitin * Itn,p * Zp) (t) 
n=1 
N 
= Js,p + 5 Gr,n,p * Ün pipn * Jt,n,p| *Zp (t), (9.46) 
n=1 


=p; 
where we identify the impulse response of the end-to-end system as 
N 


Ip; (t) = Gs,p(t) + Gang * Ün pip * It,n,p) (È). (9.47) 


n=1 
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We recall from (2.116)-(2.117) that filtering of a passband signal can be 
represented by filtering the equivalent complex-baseband signal using impulse 
responses that are downshifted. By applying this principle to each component 
of gp: (t), the complex baseband representation of (9.46) becomes 


N 
u(t) = (gs * 2)(t) + X (grn * Univ, * Jtn * Z) (t), (9.48) 


n=1 


where the impulse responses of the downshifted channels and filters are defined 
as 


gs(t) = gsp(the Pe", (9.49) 
G,n(t) = Jtn,p(the “iaeht, (9.50) 
Gend) = rnp t Pr, (9.51) 
Dna (t) = On pin (t)e “ionga, (9.52) 
The end-to-end channel has the impulse response 
N 
Jy (t) = gs(t) + X (grn * Onin * Itn) (t). (9.53) 
n=1 


We notice that the convolution of a chain of impulse responses in the passband 
becomes the convolution of the corresponding chain of complex-baseband 
impulse responses. This property seems natural but is actually a feature of 
the definitions previously made in Section 2.3.1. We considered a passband 
signal that is sent over a channel with arbitrary frequency support and defined 
how the signal and channel/filter are transformed to the baseband differently. 
This can be called the pseudo-baseband representation since the channel is 
not a baseband filter, but the output is a baseband signal since the input 
is a baseband signal. By contrast, many other textbooks consider a stricter 
complex-baseband representation, where each channel is a passband filter that 
is transformed to the baseband identically to the signal. That definition is less 
practical since wireless channels are not passband filters but support signals 
of any frequency. More importantly, it gives rise to extra scaling factors, and 
these multiply when considering convolutions of filters, making the stricter 
model inappropriate when studying reconfigurable surfaces. 

When a discrete sequence of data symbols z[k] is transmitted using PAM, 
bandpass filtered at the receiver, and sampled on the symbol rate, the resulting 
received signal sequence y[l] was expressed in (7.7) as 


yl] = X` hyle- 4 + nll], (9.54) 
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Figure 9.17: An example of a lumped-element model of a metaatom containing two inductors 
(Lı, L2), one resistor (R), and a varactor with a controllable capacitance C(vn). 


where the T + 1 discrete-time channel coefficients hy [0], ... , Ry [T] are com- 
puted based on the end-to-end channel model in (9.53) as 
hy [l] = t 
yll] = (p * gy * p) ( = 
N 


= (p * gs « p)(t) F 5o * grn * Unity * Jtn * y(t) Caa (9.55) 
n=1 


t=t/B t=0/B 


Conventional propagation models can be used for the impulse responses gs(t), 
Gtn(t), gr,n(t) of the wireless channels, by accounting for the effective areas of 
antennas and metaatoms. To compute an expression of (9.55), we must also 
characterize the impulse response Ŷn:y,„ (t) of a metaatom. 


9.3.1 Impulse Response of a Metaatom 


We will showcase a basic model of the impulse response U,.,,, (t) of a metaatom 
by analyzing a practical implementation. Figure 9.17 shows a lumped-element 
model of metaatom n containing two parallel branches where the first contains 
an inductor with inductance Lı and the second contains a series with an 
inductor with inductance Lg, a resistor with resistance R, and a varactor 
with a capacitance C(v,) controlled by the bias voltage vn. This parallel 
resonance circuit is a simplified version of the metaatom design in [162] 
and was considered for reconfigurable surfaces in [163]. Using circuit theory 
methods, the impedance of the metaatom can be shown to be 


A r 1 
j2afly (i27fL2 +R+ aa 


j2a fl, + (i2nfL2 +R+ mre) 


Zn(Un) (9.56) 


for a signal with the frequency f. We can compute the frequency-dependent 
reflection coefficient by substituting this expression into (9.13). 

Figure 9.18 shows the frequency response of the metaatom for frequen- 
cies around an intended carrier frequency of fe = 3 GHz. The parameters 
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of the lumped-element model in Figure 9.17 are Lı = 2.5nH, Lə = 0.7nH, 
Zo = 3770hm, and R = lohm. Since the reflection coefficient is complex, 
the phase and amplitude responses are presented in (a) and (b), respectively. 
There are curves in Figure 9.18(a) for four different capacitance values ob- 
tained by controlling the bias voltage of the varactor. The specific values 
have been selected to give the phase-shifts 37/4, 7/4, 71/4, —37/4 at the 
carrier frequency. The phase begins close to +7 on all the curves because the 
reradiated electric field is inverted (upside down). The phase variations are 
large when considering the GHz range, which is natural for all filters. The 
simplest representation of reflection would be a pure time delay 7, which has 
the impulse response ô(t—7) and frequency response e~/?"/7 with a phase that 
varies linearly with the frequency. Since the curves are approximately linear 
next to the carrier frequency, this model is suitable if the signal bandwidth 
B is limited to a few hundred MHz. Outside this range, the phase response 
curves have non-linear shapes that will distort the signal in the time domain; 
thus, a metaatom has a limited useful bandwidth range. As long as the system 
uses a smaller bandwidth, the reflected signal will be undistorted, and the 
propagation channels will determine whether the communication system is 
narrowband or wideband. The four curves have roughly the same shape but 
are shifted in the frequency domain; it is this shift that the varactor controls. 

The amplitude responses are shown in Figure 9.18(b) for the same ca- 
pacitance values, but different resistances: R = 1 ohm and R = Oohm. The 
theoretical maximum amplitude response is 0dB since the metaatom is a 
passive circuit that reflects the signal without amplification.* All the signal 
power is reflected when the resistance is negligible, while there are a few dB 
of amplitude losses when the resistance is non-zero. In the latter case, the 
amplitude loss is also frequency-dependent. The loss is largest at the frequency 
where the phase response is zero due to resonance in the circuit. While build- 
ing metaatoms with minimal reflection losses is desirable, we should keep in 
mind that a few dB is minor compared to the propagation losses over wireless 
channels that are typically at the order of 100 dB. 

In summary, an ideal metaatom design has no amplitude losses and a linear 
phase within the signal band. Hence, its impulse response can be expressed as 
Un, pip, (t) = 6(t — Ty,,) in the passband, which results in the downshifted filter 


Daa (t) = S(t — Ty, )e PF, (9.57) 


where the controllable delay is denoted by Ty,,. We will soon show that (9.57) 
results in a phase-shift of Yn = —27f-Ty, in the system model; thus, the 
phase-shift caused by a metaatom is controlled by tuning the reflection delay. 


4The passiveness is a key feature since active components (e.g., an amplifier) add noise to 
the reflected signal, which is not the case when using reconfigurable surfaces. Retransmitting 
devices that contain amplifiers are normally referred to as repeaters or relays and must be 
studied using different system models than in this chapter. 
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(a) Phase response for different frequencies when R = 1 ohm. 
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(b) Amplitude response for different frequencies. 


Figure 9.18: The frequency response when a metaatom with the impedance in (9.56) reflects a 
signal in free space. The curves are obtained for different capacitances of the varactor, which 
are selected to give the phase-shifts 3r /4, 7/4, —1/4, —37/4 at 3 GHz. The phase response is 
shown in (a) and the the amplitude response is shown in (b). 
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9.3.2 OFDM-Based Channel Model with a Reconfigurable Surface 


In this section, we will determine the channel coefficients in an OFDM system 
aided by a reconfigurable surface. We can use the basic channel model from 
(2.124) to express a static channel with L, propagation paths as 


Ls Ls 
Js p(t) = 5 a,i0(t+n-—Tsi) > gs(t)= 5 asie Pr Fet§(t N — Tai), 
{=l i=l 
(9.58) 
where as,i € [0,1] is the attenuation and 7,,; > 0 is the delay of path i, for 
i = 1,..., Lẹ. We recall that 7 denotes the receiver’s clock delay, which ensures 


that the receiver takes samples when the signal reaches it and not when the 
signal leaves the transmitter. This parameter must be selected as described in 
Section 7.1 to achieve the causal FIR filter representation in (9.54). We denote 
the number of propagation paths between the transmitter and reconfigurable 
surface as L, and between the surface and receiver as L,. Similarly to (9.58), we 
can then model the impulse responses to and from metaatom n as gt n p(t) = 
= ting t= Hg) and grn plt) = Da Opn jÔ +N = Teng) These can 
be expressed in the complex baseband as 


Lt 

gel SY Cte PIO = ties), (9.59) 
i=1 
Li l 

Orn(t) = 5 arn je Ot = Tiwi) (9.60) 
j=1 


where Qt,n,i,@r,n,j € [0,1] are the attenuations and Ti.ni,7r,n,j3 > 0 are the 
propagation delays. Note that the receiver’s timing delay 7 is only included 
in the channels that lead to the receiver. By substituting the metaatom’s 
impulse response in (9.57) and the channel impulse responses in (9.58)—(9.60) 
into (9.55), the channel coefficients can be computed as” 


Ls 
hyl = 5 asie 7SM sine (L + B(n- Ts,i)) + 
i=1 
N Ly L l 
DDOD Orn jOrn ie T natni tTn M sine(l+ B(N— Trnj — Te,n,i)) 


np=lj=1l i= 


(9.61) 


for L= 0,...,T, by utilizing the facts that (p * p)(t) = sinc(Bt) and that the 
convolution between sinc( Bt) and e~!?*F:'6(t — 7) is sinc(B(t — T) Je~? fer, 


5The last sinc-term in (9.61) becomes sinc(l + B(N — Tr,n,j — Tt,n,i — Typ, )) but the term 
containing Ty,, can be dropped since the delay caused by the reflection is negligible compared 
to the symbol time 1/B, in the sense that ty,,/(1/B) = Bry, 0. The metaatom nevertheless 
creates a noticeable phase-shift since fe > B. 
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We notice that (9.61) contains Ls static paths from the transmitter to the 
receiver and N LL, paths involving the reconfigurable surface. Each of the lat- 
ter paths has an attenuation Qrn,jQt,n,i that is the product of the attenuation 
from the transmitter to the metaatom and from the metaatom to the receiver. 
Each such path is also associated with a phase-shift e—J27fe(t.».5 +7t.n,i+T vn =n) 
containing the accumulated delays. The sinc-function determines how the 
signal energy carried by the path is divided between the T + 1 channel taps. 

When the reconfigurable surface is in the far-field of the transmitter, 
receiver, and other objects in the propagation environment, we can represent 
the channel using array response vectors. We assume that the surface is a 
UPA with Ny rows and Ny metaatoms per row. The ith incident path to the 
surface can be associated with an angle pair (¥;,;,6;,;), measured from the 
broadside direction of the surface, and the jth outgoing path can be associated 
with an angle pair ((o,j,0,;). If we gather the N phase-shifts related to such 
a path, they match with the array response vector expression in (4.128): 


1 
eT J2rfe(T,2,2—-7,1,4) 
aNg, Ny (Yii Oii) = : f (9.62) 
eT 2T fe(Tt, N, i—Tt,1,i) 
1 
eT}27 fe(Tr,2,j—Tr,1,5) 
aNu, Ny (Po,j7 90,3) = : (9.63) 


e127 fe(T,N,j —Tr1,9) 


Furthermore, the attenuation is the same for all the metaatoms: Qt n,i = Qt,1,1 
and Qyn,j = Qr,1,j for all n, where we take the first metaatom as the reference. 
For a given path, the delay variations across the surface are negligible in the 
sense that Br, nj ~ Brij; and Barini ~ Br1,:, for all n. We can utilize 
these properties to rewrite the channel coefficients in (9.61) as 


Ly Lt 
hyll] = ol + X > cif Many ny (Voj 8o) Dyana, Ny (Vii 6,1), (9-64) 


jg=li=l 
where Dy = diag(e”,...,e”) contains the controllable phase-shifts 
Yn = -2T feTyn (9.65) 


created by each of the metaatoms, and the channel coefficients that depend 
on the tap index are gathered in the sequences 


Ls 
ch = So asie Pm sine (6 + B(n — 73,4), (9.66) 


i=1 
cij = Ar 1, ja, 116 P tT sine (EC PA tae = Te,1,i))- (9.67) 
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The channel coefficient at the 4th tap has the same structure as the narrowband 
channel expression in (9.23), but the fact that there are multiple taps calls 
for a different kind of signal transmission. If we apply OFDM with S data 
symbols per block and a cyclic prefix of length T, it follows from (7.25) that 
we obtain S memoryless subcarriers of the kind 


glu] = hylli] + aly], for v=0,...,9—1, (9.68) 


with the reconfigurable frequency-domain channel coefficients 


De Th 


hylv] = Gl] + X 9 Gala vy (Po, 90,7) Dyanu.ny (Pii) (9-69) 
j=li=1 


that depend on the DFTs of the time-domain channel coefficients: 


a= Sele S, tO, 059 1, (9.70) 
l=0 
T 
Gij =X aghe S, v=0,...,8—1. (9.71) 
£=0 


T 


1 - 
E ely é.[v] 
hy [v] ; Ly Le ; 
D De G.g[V]AaNg.Ny (Po, 90,5) © aNu, Ny (Pits 9,4) 
R j=l i=1 
elYNn 
Aam =ħ[v] 
=yT 
(9.72) 
where © denotes the entry-wise product between two vectors. The expression 
hy[v| = w*h{v] is the simplest we can obtain for the subcarrier channel 


coefficient when using a reconfigurable surface. It is the inner product between 
a” and h{v], where the former vector depends on the surface configuration, 
while the latter fully characterizes the channel on subcarrier v. 


9.3.3 Wideband Capacity Maximization 


The capacity of a SISO-OFDM system was presented in Theorem 7.1 with 
arbitrary but static channel coefficients. For a given surface configuration w, 
as defined in (9.72), the capacity becomes 


| 2 


B 
So _ tas, |14 it/s, 
C TŠ 085 No bit/s (9.73) 
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where T is the length of the cyclic prefix, 


and the variable pi is selected to make 3797) q°?t = q9. 

The capacity value in (9.73) depends on the configuration w. Corollary 9.1 
showed which configuration maximizes the SNR in the narrowband case, 
which also leads to the maximum capacity in that case. The solution was 
to select so that the N +1 terms in the inner product "h, where Ù 
was defined in (9.38), get the same phase. The optimization task is more 
challenging in the wideband OFDM scenario because there are S different 
SNRs among the S subcarriers. The SNR on subcarrier v is proportional to 
|ab"h[v]|2, where the channel vector h[v] is subcarrier-dependent while the 
surface configuration vector w is not. This resembles the analog beamforming 
situation in Section 7.3.1, where the same beamforming vector had to be 
used on all subcarriers. The wideband capacity maximization problem is 
computationally challenging but is partially addressed in [164], [165]. In this 
section, we will cover a suboptimal but effective way to configure the surface 
in these situations. 

The optimal surface configuration in the narrowband case maximizes the 
channel gain. We can aim to do the same in the wideband case by maximizing 
the total channel gain over all subcarriers: 


3 liglel] = > hri =v" » ie wi") b. (9.75) 
v=0 v=0 v=0 ~ 


If we could pick w as any unit-norm vector, this quadratic form would be max- 
imized by selecting q as the dominant eigenvector of A = h*(vJh7(v] 
(associated with the largest positive eigenvalue). However, we have a stricter 
constraint on the configuration vector because the first entry must be 1, and 
the remaining ones must have unit magnitude. There is no simple way to 
maximize (9.75) under this constraint, but an efficient iterative algorithm 
was proposed in [166]. The starting point is the power iteration method [167], 
which finds the dominant eigenvector of A by the iterative computation 


Aw; 
Wi+l = Tao t=, loon, (9.76) 
|| Aw;| 
which is initialized from arbitrary non-zero vector wo € CNI, In each 
iteration, the multiplication Aw; amplifies the component of w; that is 
aligned with the dominant eigenvector relative to all other components. The 
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convergence speed of the power iteration method depends on how much larger 
the largest eigenvalue is compared to the second largest eigenvalue. 


A modified power iteration is described in Algorithm 9.1, which is initialized 
from wp, = [1,...,1]", where the surface is not changing the phases. In each 
iteration, the computation in (9.76) is made using w; = w,;. The result is 
used to determine the next surface configuration p; by only keeping the 
phases of the entries of w;,1 while replacing their magnitudes by 1. Since the 
same value is obtained in (9.75) for = and e~/?, for any common phase-shift 
@, we can shift the phase of all the entries so that the first entry in W441 
becomes 1. This shift is necessary since the phase of the static channel cannot 
be modified. Note that [w],, denotes the nth entry of w in the algorithm. 


Example 9.7. Suppose there is only one path to and from the reconfigurable 
surface (i.e., Le = Ly = 1). Which phase-shift configuration maximizes (9.75)? 
Under these conditions, the channel vector in (9.72) simplifies to 


ap = Bl (9.77) 


where a = any, ny (Yo; 90) © ANy,Ny (Yi, 0i) and we dropped the path indices. 
Hence, the matrix A in (9.75) can be expressed as 


-E She RST-[S eo 


where bss = 7929 |@[v]|2 > 0, b = ESZ [er] |? > 0, and bs = 0829 &[v] qv). 
If the phase-shift configuration is expressed as Y = |1, w™]", then 


ab" Aw = bss + baa w + be w"a* + bw'a*a tw 
= bss + 2R{b,a*w} Gilg blatw|?. (9.79) 


Among all vectors that satisfy ||w||? = N, the third term is maximized when 
w = e /%a* for any ¢, while the second term is maximized similarly but only 
for ¢ = arg(bs). The resulting solution w = e~/2"8(%)a* is achievable with 
a reconfigurable surface since a is a vector with phase-shifts obtained from 
two array response vectors. Hence, the configuration that maximizes the total 
channel gain is parallel to the complex conjugate of the element-wise product 
a of the array response vectors to/from the surface and has an additional 
common phase-shift ars) c [v]e[v]) that aligns the static and controllable 
paths to the extent possible in an OFDM system. 
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Algorithm 9.1 Constrained power iteration to maximize (9.75). 


1: Initialization: Select wp) = [1,...,1]" and number of iterations L 
2: fori = 0,..., L — 1 do 
3 Wi+1 © E 

4. ġ < arg ([wi+1]1) 
5 Pi = i el(are(lwi+i]2)—%) |, ei(are(lwi+i]v41)—9) T 
6: end for 
7: Output: Yr 


To evaluate the effectiveness of the power iteration method in general 
propagation scenarios, we can compare the capacity that it achieves with an 
upper bound. Suppose we could select a different value of Y on each subcarrier. 
We could then simultaneously maximize the channel gains of all subcarriers by 
following the approach in (9.30)-(9.31) in the narrowband case. The resulting 
upper bound on the capacity can be expressed as 


BS albb 
C < = ] 14+ —_—"1 9.80 
= T + S 2 082 + No , ( ) 
where ||- ||; denotes the 1-norm and q, = max(u — No/|[h{v]||?,0), for v = 


0,...,5—1, with u selected to make SS quy = qS. The upper bound is only 
achieved with equality in the unlikely event that the same surface configuration 
happens to maximize the channel gains on all subcarriers simultaneously. 

Figure 9.19 shows how the capacity varies with the bandwidth in a scenario 
of the kind illustrated in Figure 9.10. Specifically, a reconfigurable surface is 
deployed along the yz-plane with its reflective side facing the positive x-axis 
and its center at (0,0,0)m. The base station and the user are located at 
(40, —200,0)m and (20,0,0) m, respectively. We assume LOS channels with 
multiple reflected paths to and from the surface, while the static channel is 
of NLOS nature. The surface has the size 0.5 x 0.5m, which for a carrier 
frequency of 3 GHz corresponds to N = 400 metaatoms that each has the 
dimension \/4 x A/4. The channels are modeled similarly to the 3GPP channel 
model in [168], and the capacity is averaged over random realizations of the 
multipath components’ characteristics. The capacity in (9.73) is shown in 
Figure 9.19 as a function of the bandwidth B, assuming that the transmit 
power grows proportionally to the bandwidth so that the signal power spectral 
density is 1 W per MHz. The subcarrier spacing is 150 kHz, so the number of 
subcarriers increases with B as well as the number of channel taps. 

The dashed-dotted curve in Figure 9.19 uses the power iteration method, 
and it provides 96-99% of the upper bound from (9.80). This method will find a 
configuration that takes the signal power from the strongest incident direction 
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Figure 9.19: The capacity achieved over a wideband OFDM channel grows proportionally to 
the bandwidth and can be improved using a reconfigurable surface. The capacity achieved when 
configuring the surface using the power iteration in Algorithm 9.1 is compared with the upper 
bound in (9.80), the use of a random configuration, and the removal of the surface. 


and reflects it in the direction that maximizes the received signal power. Since 
there are LOS channels to and from the surface, this is approximately equal 
to taking the signal from the LOS path and beamform it along the LOS 
path to the receiver. The performance gap grows with B due to the increased 
frequency-selectivity, but since the LOS paths to/from the surface are stronger 
than the scattered paths, it is possible to find a single surface configuration 
that works well over the entire band. This is reminiscent of how the analog 
beamforming architecture can provide rates close to the capacity in LOS- 
dominant scenarios. A part of the gap between the power iteration method 
and the upper bound can be closed using a more advanced configuration 
algorithm (examples are given in [164], [165]), but at the expense of increased 
computational complexity. 

The importance of properly configuring the surface is also illustrated in 
Figure 9.19. The dashed curve considers the average capacity over random 
phase-shift configurations with independent uniformly distributed phases from 
[-7, 7r), while the dotted curve considers the absence of a surface (i.e., it is 
replaced by an absorbing material). There is barely any difference between 
these curves, but there is a huge performance gap compared to the power 
iteration method. Hence, deploying a reflecting surface in this setup is only 
meaningful if it is configured to beamform the signal toward the receiver. The 
capacity is increased by 2-3 times when doing that, which results in large bit 
rate differences when many MHz of bandwidth are used. 
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9.4 MIMO Applications of Reconfigurable Surfaces 


Reconfigurable surfaces can also be used to enhance MIMO channels, but 
there are limitations to what can be achieved since all signals reflected by 
a particular metaatom are phase-shifted identically. To shed light on the 
fundamentals using simple notation, we return to the narrowband case in 
this section and consider three different scenarios: point-to-point MIMO and 
multi-user MIMO communications, as well as multi-antenna target detection. 


9.4.1 Enhanced Point-to-Point MIMO Communication 


In the point-to-point SISO case considered previously in this chapter, the 
end-to-end channel coefficient was expressed in (9.23) as h = hs + h?Dyhy. 
In a point-to-point MIMO scenario with K transmit antennas and M receive 
antennas, the channel matrix H € C“** can be expressed similarly as 


H =H, + H,D,Hi, (9.81) 


where H, € C™™*¥* is the static channel, H, € C%** is the channel from the 
transmitter to the surface, and H, € CMXN is the channel from the surface to 
the receiver. These channel matrices can be modeled just as any other MIMO 
channels because the propagation to/from the surface is the same as if the 
array of metaatoms were an array of antennas. 

The reflection matrix does not change in the MIMO case but is defined 
as in (9.17): Dy = diag (e”,...,e"). This matrix can adjust how the 
matrices H, and H, are multiplied in (9.81), but the flexibility is limited since 
it contains N controllable phase parameters while there are respectively MN 
and KN coefficients in the channel matrices. These numbers were equal in the 
SISO case with M = K = 1, while there are many more channel coefficients 
than controllable parameters in the MIMO case. 

When matrices are multiplied, the rank of the resulting matrix is always 
smaller or equal to the minimum rank of the individual matrices. The reflection 
matrix always has full rank. However, the rank of H,D,,H; cannot surpass 
the minimum rank of the channel matrices H, and H;. This implies that 
the reconfigurable surface cannot improve the channel rank in any dramatic 
fashion. However, it can be configured to improve specific singular values, 
match the strongest channel dimensions from the two channel matrices, and 
ensure that the static and configurable terms in (9.81) fit well together. Since 
many possibilities exist, we should identify the surface configuration that 
maximizes the MIMO channel capacity. In this section, we will first provide 
some geometrical insights into what can be achieved and then derive a general 
algorithm that iteratively refines the configuration to increase capacity. 
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Example 9.8. Suppose the surface is deployed to have far-field LOS channels 
to both the transmitter and receiver. How should the surface be configured 
to maximize the total end-to-end channel gain ||H||2,? 

The matrices H, and H, have rank one under these conditions, as explained 
in Section 4.4.1. They could be expressed as outer products of array response 
vectors, but for notational convenience, we will express them as 


H, — arb; , H; = arbi, (9.82) 


where a; € cy, be = [aa ae pois € CAF Gi = [at,1, Pei as, N|” € CAR and 
b; € CË are vectors. We can then simplify (9.81) as 


H = H, + a, b? Dya; bf, (9.83) 
D 


where a = b; Dya, € C is a scalar. The second term adds the rank-one matrix 
a,b; to the static channel with a scaling that is determined by the reflection 
matrix. The total end-to-end channel gain can be rewritten using (5.88) as 


|H|> = tr (H"H) 
= tr (H®H,) + |a| tr (bžařa,b{) + tr (aHïa,bf) + tr (a*bļa Hs) 
= ||Hs|lp + lal’ |lar||?|Ib:||? + 2R (abp Haz) , (9.84) 


where the last equality follows from (2.52) that states how one can shift the 
order of matrices in the trace function. The final expression is maximized 
when |a| takes its largest possible value while its phase makes the last term 
positive. This happens when Yn = —arg(b/H¥a,) — arg(br nat n) + 27kn, 
where kpn is the integer that gives Yn € [—7,7), for n = 1,...,N. When 
the static channel is weak, maximizing ||H||} is approximately equivalent 
to maximizing the channel capacity because there will only be one strong 
singular value, which is amplified using the surface. 


As noted earlier in this chapter, deploying the reconfigurable surface to 
have LOS channels to the transmitter and receiver is desirable. However, 
in contrast to the example, the channel matrices will not have rank one in 
practice due to multipath propagation. This makes it harder to compute the 
optimal phase-shift configuration directly. The design principle remains the 
same, as can be showcased using the beamspace representation. Suppose the 
transmitter and receiver are equipped with half-wavelength-spaced ULAs. As 
explained in Section 5.6.2, we can then transform the channel matrix to the 
beamspace by multiplying by DFT matrices from the left and the right: 


H = F} HF% = FHH FY + FHH, DoH FY. (9.85) 


The first term represents all the static multipath clusters, while the second 
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Figure 9.20: A transmitter with a half-wavelength-spaced ULA with K = 5 antennas com- 
municates with a receiver having a half-wavelength-spaced ULA with M = 10 antennas. The 
communication is aided by a reconfigurable surface that adds extra paths to the MIMO channel. 
In the beamspace representation, these extra paths are concentrated around a few angular 
directions since the surface mostly interacts with multipath components in its vicinity. This is a 
continuation of Figure 5.33 where no reconfigurable surfaces existed. Note that the transmitter 
and receiver sizes are exaggerated compared to the propagation distances. 


term represents the propagation paths affected by the reconfigurable surface. 
Figure 9.20 exemplifies how these matrices depend on the angular geometry 
and is a continuation of Figure 5.33, which considered the same setup without 
a reconfigurable surface. Each of the six static multipath clusters contributes 
to the entry of FyH:Fẹ with the matching color. A white entry means 
its value is nearly zero because no propagation path connects that pair of 
transmit /receive directions. Since we consider deployment with LOS to the 
transmitter and receiver, the reconfigurable surface mainly contributes to one 
entry, determined by its physical location. However, it could slightly affect a 
few neighboring entries using the multipath clusters around it, as illustrated 
by the brightness of the coloring (brighter means smaller). 


The rank of H is the same as the rank of H. The static channel matrix de- 
scribes the contributions from all multipath clusters that create paths between 
the transmitter and the receiver. Clusters seen from very different angles barely 
interact and contribute to different singular values in H. The reconfigurable 
surface can increase the channel rank by creating new non-zero entries. Since 
the surface is deployed at a specific location, it will primarily interact with the 
propagation environment in its vicinity. Hence, all the new propagation paths 
it creates have similar angles from the transmitter’s and receiver’s perspectives. 
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This implies that we should expect the surface to mainly contribute to one or 
a few singular values in H. The reconfigurable surface can raise the channel 
capacity by selecting a good phase-shift configuration, but the increase will 
resemble the SISO case since only one channel dimension is improved. Multiple 
reconfigurable surfaces deployed at physically diverse locations are generally 
required to enhance multiple singular values substantially. An analogy can 
be made with the analog/hybrid beamforming considered in Sections 7.3.1 
and 7.3.2: a single surface is like an analog beamforming architecture that can 
only receive/transmit one signal, while multiple surfaces are like the hybrid 
beamforming architecture that can jointly receive/transmit as many signals 
as there are surfaces. While this is generally true, there are specific scenarios 
where the surface contributes to many singular values [169]. This happens 
when the arrays and surface are very large compared to the wavelength so 
that spherical wavefronts can be utilized to achieve high-rank channels via 
the surface. The enabling factor is the same as in Section 4.4.3, where we 
showed how to achieve full-rank LOS channels by making the antenna spac- 
ing sufficiently large compared to the propagation distance. Similarly, one 
can make the reconfigurable surface so large compared to the propagation 
distances that it acts as multiple surfaces. 


For a given MIMO channel matrix, the capacity is achieved by transmitting 
along the right singular vectors and allocating power using water-filling (see 
Theorem 3.1). If we modify the channel matrix by refining the reflection 
matrix Dy, the singular vectors and values will change, and so will the 
capacity-achieving precoding. We will describe an algorithm from [170] that 
progressively improves the MIMO capacity by iterating between updating the 
precoding /water-filling and reflection matrix. We recall from (3.100) that the 
capacity for a given channel matrix H can be expressed as 


1 
C = log, («ct (tu 4 g Hva ven") ) . (9.86) 
0 


We introduce the notation H, = [h,1,..., h, y] and Hy = (he 1, re Beal 
where hy, is the nth row and the arrow notation points out that rows are 


horizontal. We can then rewrite the channel matrix in (9.81) as 


N 
H = H,+H,DyH, = Hs + 5 huet hg; = Hn +h nhen, (9.87) 
i=l 


where Hn = Hs + Sule h, ie)” hz, contains all the terms except the one 
involving Yn. We want to reconfigure this phase-shift to increase the capacity 
when all other parameters are fixed. By substituting (9.87) into (9.86), we 
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can express the capacity as 


logs (aet (11 + Ne (Hn + en Beatin) vory" (H, a elm bsnl.) )) 
= log, (det (An + el¥oh, nb” + eine) ; 


= logy (det (An) + logy (det (Im +e!" A7 My mb} +e "AZ brhi, )) , 
(9.88) 


where the terms that are independent of Yn are included in 


1 1 5 a 
An = Iu + —H,VQ?*V"HE + —h, nht VQ?'V"hz,,h®,, (9.89) 
No No ? , i 
1 = 
bn = —H,VQ?*V"h?,. (9.90) 
No i 


Only the determinant in the second term of (9.88) depends on the phase-shift. 
This determinant can be computed as 


PAL pë 
det (tu + [An brn An bn] Ee }) 


ein bE a zij 
= det | I> + ewes [Aj hrn A, bn] 


= (1+ Ò” bi A h, n) (1 + ehi An bn) — br An brhi, Az hn 
= pt A, thr n + e Inhi A, bn + constants, (9.91) 


where the first equality follows from Sylvester’s determinant theorem in (2.53), 
and we then compute the determinant for the resulting 2 x 2 matrix. The 
final expression in (9.91) is maximized when the first two terms are positive, 
which is achieved by 


Yn = —arg(b™ A7 th, n). (9.92) 


We can utilize this result to obtain the iterative procedure described in 
Algorithm 9.2. The algorithm begins by computing the capacity-achieving 
signal covariance matrix VQ°?'V® for an initial set of phase-shifts. It then 
refines the N phase-shifts sequentially using (9.92). When this is done, the 
capacity-achieving signal covariance matrix is recomputed for the new channel 
matrix obtained with the new reflection matrix, and the procedure is repeated 
L times. Each step in the algorithm either improves the achievable rate or 
keeps it fixed because we can always choose not to modify the phase. The 
rate will eventually converge when no further changes are beneficial. However, 
a consequence of sequential optimization is that the algorithm might not 
converge to the best possible configuration but only a locally optimal solution 
where one cannot further increase the capacity unless multiple phases are 
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Algorithm 9.2 Reconfigurable surface configuration for point-to-point 
MIMO capacity maximization. 


1: Initialization: Set Y%1,..., Yy randomly and select the number of 
iterations L 
2: for i = 1,..., L do 
3: Compute the capacity-achieving covariance matrix VQ°?'V™ for 
the channel matrix in (9.81) with Dy = diag(e”,..., ed”) 


4: for n= 1,..., N do 

5: Compute An in (9.89) and b,, in (9.90) for fixed %1,..., YN 
6: Wn => arg(b A; hrn) 

T: end for 

8: end for 

9 


: Output: Y1,..., YN 


updated simultaneously. The channel matrices must be estimated before 
running Algorithm 9.2. No extra wireless signaling is required while running 
the algorithm, which can be executed at the receiver. Hence, the procedure 
shown in Figure 9.14 can still be followed: the transmitter sends pilots while 
the surface switches between predefined configurations, and then the receiver 
computes the preferred configuration and sends it to the surface. 

Figure 9.21 shows how the capacity is improved with the iteration index 
from Algorithm 9.2 in a point-to-point MIMO scenario with M = K € 
{1,2,4,8} antennas and N = 100 metaatoms. The SNR of the static path is 
0dB and Hy, has i.i.d. Rayleigh fading entries. The channel matrices H, and 
H; via the surface are subject to Rician fading with the «x-factor k = 10 and 
the NLOS part having an i.i.d. Rayleigh fading distribution (see Example 5.18). 
The cascaded path via a single metaatom has the SNR —10dB. The results 
are averaged over many channel realizations. 

The iteration index 0 in Figure 9.21 represents the initial state when the 
phase-shifts are uniformly distributed between 0 and 27, thereby approximat- 
ing diffuse scattering. The first iteration of Algorithm 9.2 leads to a substantial 
capacity improvement, while only minor improvements occur in the subse- 
quent iterations. The vertical gaps between the curves grow when comparing 
the first and last points on the curves. This shows that a system with more 
antennas benefits slightly more from having a well-configured surface, but 
the difference is small because the surface mainly contributes to one singular 
value. We use the optimal configuration from Corollary 9.1 when considering 
the SISO case with M = K = 1, which is why that curve does not vary with 


6The initial phase-shifts determine which configuration that Algorithm 9.2 converges to. 
A simple way to explore if better solutions exist is to consider multiple random phase-shift 
initializations and compare the capacity values they converge to. 
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Figure 9.21: The point-to-point MIMO capacity as a function of the iteration index when 
running Algorithm 9.2 to iteratively select the phase-shifts of the reconfigurable surface to 
increase the capacity. The index 0 represents the initial random configuration. The optimal 
configuration is directly used in the SISO case (M = K = 1) since it is known in closed form. 


the iteration index. Interestingly, the optimal SISO configuration leads to a 
higher capacity than the initial capacity in the 2 x 2 and 4 x 4 MIMO setups, 
reiterating the importance of correctly configuring the surface. In summary, 
the deployment of a reconfigurable surface can greatly improve the capacity 
of a point-to-point MIMO system. 


9.4.2 Enhanced Multi-User MIMO Communication 


A reconfigurable surface can also improve the communication performance 
over multi-user MIMO channels. As explained in Chapter 6, the precoding 
and combining differ substantially from the point-to-point case because users 
are not collaborating in the signal processing and measure their capacity 
separately. Nevertheless, the sum capacity expression in multi-user MIMO 
resembles the capacity expression in point-to-point MIMO, which implies that 
we can use similar algorithms to optimize the reconfigurable surface. 

The uplink sum capacity in a multi-user MIMO system with K single- 
antenna users and M antennas at the base station is given in (6.49) as 


log. (aet (11 Es un") bit /symbol, (9.93) 
0 


where H = [{hj,..., hg] € C™** is the channel matrix and q = P/B is the 
signal energy per symbol. When the uplink is aided by a reconfigurable surface 
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with N metaatoms, the channel h € C™ from user k can be expressed as 
hy = hs, + HrDyhi,«, (9.94) 


where hs, € C™ is the static channel, hik € CY is the channel from the 
user to the surface, and H, € C”™*% is the channel from the surface to the 
receiver. This is the SIMO counterpart to the channel model in (9.81). The 
term H,D,h;,;, in (9.94) can be viewed as the projection of the user-specific 
channel vector h œ onto the span of the matrix H,D,, which is the same for 
all users but controllable using the reflection matrix Dy = diag(e”,..., i”). 
The combined channel matrix of all users becomes 

H= (hy, Ss hg] = [hs,1, is , Hs x] +H,D,, (het, T hix], (9.95) 


Ne 
=H, =H; 


which has the same form H = H, + H,D,,H; as in the point-to-point MIMO 
case. In particular, the reflection matrix enters into the equation identically. 


Example 9.9. Suppose the surface is deployed to have a far-field LOS channel 
to the base station. How can it modify the user channels in this case? 

The matrix H, has rank one under these conditions, as explained in 
Section 4.4.1, and can be expressed as H, = a,b? for some vectors a, € cy 
and b, € C. The channel of user k in (9.94) then becomes 


N= ligi + a, br Dyhix 5 (9.96) 
ee ey, 


=ap 


This implies that the surface adds a component a,za; to the static channel 
vector, where only the complex scaling factor a; can be controlled and depends 
on the user index. In case the static channels are blocked (i.e., hs, = 0 for 
all k), the K channel vectors are parallel. We cannot suppress interference 
under such circumstances; thus, FDMA achieves the same sum capacity as 
multi-user MIMO in this case. In conclusion, the reconfigurable surface cannot 
enable multi-user MIMO communications on its own, but it can improve 
performance by making the channels họ more diverse than the original static 
channels by adding the components a;,a,. 


Since there are K user capacities to consider in multi-user MIMO systems, 
different phase-shift configurations are preferred for different users. In other 
words, the reconfigurable surface bends the shape of the capacity region, and 
there is typically no configuration that results in a region that is larger than 
all other achievable regions in all user dimensions. In this section, we will 
concentrate on maximizing the sum capacity in (9.93). Similarly to the last 
section, we will develop an iterative algorithm that updates one of the N 
phase-shifts at a time to increase the capacity. When refining the phase Yn of 
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metaatom n, it is convenient to express the channel matrix H = H,+H,D,,H; 
in (9.95) as 


N N 
H=H,+ 5 hudh; =Hs+ XO hyd ig +d” h nhen, (9.97) 
i=1 i=l, in 
maaar 
=H, 


using the notation H, = [h;1,..., h, y] and Hy = (hi1,.--, hen]. Note 
that hf, is the nth row of H, and differs from the user channel vector hy, 
appearing as the kth column of the matrix. By substituting (9.97) into (9.93), 
we can express the sum capacity as 


log, (aet (1 ut re (H, + ein hy nh, ) (H, + drh, nhe al D) 


= log, (det (An + eh, nbi + eY» brh) ) , 


= log, (det (An)) + log, (det (Iu + ele A7 1h, nb” + een A; bahin )) l 
(9.98) 


where the terms that are independent of Yn are included in 


q 1. q H q LT p* H 
b, = —H,h;,, An = Im + —H,Hi +h nhi nhi nhin. (9.99 

No t,n M + No n + No : t, tın Tr,n ( ) 
It remains to select the phase-shift to maximize the second determinant in 
(9.98), and this problem has the same form as in (9.91) of the point-to-point 
MIMO case. Hence, the optimal phase is obtained from (9.92) as 


Yn = — arg(b} A7 th, n), (9.100) 


but using the expressions for b, and A,, defined above. By sequentially up- 
dating the N phases using (9.100), we obtain Algorithm 9.3. This algorithm 
resembles Algorithm 9.2 for the point-to-point MIMO case, but a key differ- 
ence is that the precoding is not updated in multi-user MIMO because the 
sum capacity is always achieved when the users transmit their signals using 
maximum power. Each step in the algorithm either improves the sum capacity 
or keeps it fixed because we can always choose not to modify the phase; thus, 
the sum capacity gradually increases and converges to a final value. We let 
L denote the predefined number of iterations to consider, but the algorithm 
can also be terminated earlier when the sum capacity has not been improved 
much from one iteration to the next. Although the sum capacity improves 
monotonically, there is no guarantee that the algorithm will converge to the 
best conceivable configuration because the variables are optimized sequentially 
rather than jointly. 

Figure 9.22 shows how the sum capacity of an uplink multi-user MIMO 
system increases with the number of metaatoms. There are K = 4 users, 
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Algorithm 9.3 Reconfigurable surface configuration for uplink multi-user 
MIMO sum capacity maximization. 


1: Initialization: Set Y%1,..., Yy randomly and select the number of 
iterations L 
: for i= 1,..., L do 
for n= 1,..., N do 
Compute A,, and bn in (9.99) using current Y1,..., YN 
Yn — —arg(bp An hrn) 
end for 
: end for 


: Output: Y1,..., YN 


ow fo Pf & NS 


M = 10 receive antennas, and the static channel is modeled as in Figure 6.16. 
The channel matrix H, between the base station and reconfigurable surface is 
modeled as Rician fading with the «-factor k = 10 and the channels between 
the users and surface are subject to iid. Rayleigh fading. SNR of the static 
path is 0dB, while the cascaded path via a single metaatom has the SNR 
—20dB. The results are averaged over many channel realizations. We notice 
that the sum capacity grows rapidly with the number of metaatoms when 
Algorithm 9.3 is used; hence, deploying the reconfigurable surface in this 
particular setup makes a great difference. L = 1 iteration of the algorithm 
is sufficient to outperform the initial configuration with random phase-shifts. 
Further capacity improvements are achieved by running L = 5 iterations of 
the algorithm, especially when there are many metaatoms to configure. 

We have focused on the uplink thus far, but the results are also useful for 
the downlink because the uplink-downlink duality implies that we can achieve 
the same user rates in both directions. Hence, if the surface is configured to 
provide a high uplink sum capacity, we can achieve the same downlink sum rate 
using the same power without changing the surface configuration. However, 
this will generally not be the downlink sum capacity because we might have a 
different total downlink transmit power and can allocate it arbitrarily between 
the users. The downlink sum capacity was stated in (6.122) as the problem 
of maximizing the sum rate in the virtual uplink with respect to the virtual 
uplink powers, and it can be solved efficiently using convex optimization tools. 
It is straightforward to devise an iterative algorithm that switches between 
solving (6.122) for given user channels and enhancing the channels using 
Algorithm 9.3 for given virtual uplink powers. 

The uplink sum capacity requires SIC, while the downlink sum capacity is 
achieved using DPC. It is easier to implement uplink and downlink multi-user 
MIMO systems with linear signal processing, but unfortunately, it comes at 
the price of more complex parameter optimization problems. For example, the 
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Figure 9.22: The sum capacity of an uplink multi-user MIMO system that is aided by a 
reconfigurable surface with a varying number of metaatoms. The phase-shift configuration is 
either selected randomly or by running Algorithm 9.3 with L = 1 or L = 5 iterations. 


uplink sum capacity is achieved when all users transmit with their maximum 
power, while the maximum uplink sum rate with linear combining requires 
power control optimization (as exemplified in Section 6.3.6). Similarly, the 
parameters required to achieve the downlink sum capacity are obtained by 
solving the convex optimization problem in (6.122), while the linear precoding 
that maximizes the downlink sum rate can only be computed using high- 
complexity global optimization algorithms [84], [85]. The structure of the rate 
expressions causes increased complexity and makes phase-shift optimization 
more complicated when a reconfigurable surface supports a multi-user MIMO 
system that employs linear processing. We refer to [155], [171], [172] for further 
details and solutions to these problems. The bottom line is that reconfigurable 
surfaces can improve the user rates in multi-user MIMO systems, and many 
algorithms for phase-shift optimization can be developed for various utility 
functions and kinds of signal processing for data transmission and reception. 


9.4.3 Enhanced Target Detection 


A reconfigurable surface can also be used to improve the wireless channel 
properties for sensing applications [173], [174], particularly to increase the 
SNR and reliability. To exemplify this, we will consider a mono-static target 
detection scenario, where a multi-antenna radar system must determine 
whether a target exists at a specific location or not. A reconfigurable surface 
is deployed in the same area, and there are free-space LOS channels between 
the different locations, as illustrated in Figure 9.23. The radar transceiver 


9.4. MIMO Applications of Reconfigurable Surfaces 649 


Reconfigurable surface 


Radar transceiver: 
K transmit antennas 
K receive antennas 


Figure 9.23: A radar transceiver with K transmit antennas and K receive antennas wants to 
detect the presence of a target with assistance from a reconfigurable surface with N metaatoms. 
There are two paths from the transmitter to the target and two paths from the target to the 
receiver, resulting in four propagation paths. Solid lines represent paths leading to the target 
and dashed lines are paths leading back to the receiver. 


has K transmit antennas and K receive antennas, which are symmetrically 
arranged to achieve identical array response vectors. The surface consists of 
N metaatoms. 

A predefined radar signal px is transmitted using the precoding vector 
p € C*, and it reaches the target location in two ways: through the direct LOS 
path or via the reflection by the surface. If the target exists, it will reflect the 
signals, and these can reach the receiver either through the direct LOS path 
or via reflection by the surface. This gives rise to a total of four propagation 
paths from the transmitter to the receiver. We let h, € C* denote the static 
LOS channel between the transmitter and target location. Furthermore, the 
cascaded channel from the radar to the target via the reconfigurable surface 
is represented by the vector 


h. = a,b?7D,h,, (9.101) 


where a,b? € C**% is the rank-one LOS channel matrix between the radar 
transceiver and surface, Dy € CN*N is the reflection matrix, and h, € C™ 
is the channel between the surface and target. For notational simplicity, we 
will not include any channel gains in hs, ay, bi, and h, but model them 
separately. Hence, these are four array response vectors that describe the LOS 
propagation between the different locations, which implies that the squared 
norm of each vector equals the number of entries it has. 
If the target exists, the effective end-to-end channel to the receiver is 


h= ( cihsh? + coh. ht + cgh,h? + c3h. hy )p, (9.102) 
“Ss Sa SS 
LOS path Via surface Mix of LOS and surface paths 


where we included the precoding vector and cı ~ Nc(0, 81), c2 ~ Ne (0, 62), 
and c3 ~ Nc(0, 83) are three independent RCS realizations for the target, 
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which include the channel gains as well. Multiple realizations are required 
because we consider signals reaching and leaving the target in different direc- 
tions. However, the coefficient c3 appears twice due to channel reciprocity, 
which implies that the RCS is the same when the signal propagates from the 
transmitter to the target and back via the surface, and when the signal travels 
in the opposite direction. There are four terms in (9.102) representing the four 
propagation paths from the transmitter to the receiver. The first term is the 
direct reflection by the target that would also happen in the absence of the 
surface. The second term is the path that reaches the target via the surface 
and then goes back to the receiver in the same way. The third term is the 
path that reaches the target via the surface and then is reflected through the 
LOS path, while the fourth term takes the opposite direction. The variances 
61, B2, 83 are generally different because they include the multiplication of the 
channel gains between the different locations that the radar signal passes on 
its way from the transmitter to the receiver. We can expect that 6, > 63 > b2 
since the LOS path is typically stronger than the path via a single metaatom; 
however, with an appropriate surface configuration, the combined effect of 
the N metaatoms can make a large difference for target detection. 

We assume that the signal VP is transmitted, denote the received signal 
by y € C*, and formulate the binary hypothesis test 


Ho : y=n, (9.103) 
Hı : y=VPh+n, (9.104) 


where n ~ Nc(0,07Ix) is the additive noise vector. 
If the hypothesis is true, the channel covariance matrix is 


R = E {hh"} = 3,/h’p/2h,hY + 69|h p/h" 


+ 63((h¢p)hs + (hy p)he) ((bgp)hs + (hyp)he). (9.105) 
This matrix consists of three terms, where the first term only utilizes the LOS 
path while the remaining two terms are created thanks to the reconfigurable 
surface. Each of the terms has rank one because they are outer products of 
vectors, but all terms are spanned by h, and h, so R has rank two (if K > 2). 

The precoding and surface configuration can be selected to optimize this 
covariance matrix. The reflection matrix Dy only affects the norm of the 
cascaded channel vector h, in (9.101) since bfD,,h, is a scalar. We showed 
in Section 9.2 that the magnitude of this term is maximized by (9.27), where 
the phase-shifts ensure that we sum up N phase-aligned terms. Since h, and 
b; are array response vectors where each entry has unit magnitude, it follows 
that b{D,h,; = N when using the optimal configuration. 

The precoding vector should be a linear combination of h, and h, = Na 
since these are the two transmission directions that lead to the target. To 
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maximize the average SNR, we can select the precoding vector that maximizes 
E { h|[?} = tr(R) 
= 8, |b p|?||has\|? + B2\hep|?|[he|]? + 93|| (he p)hs+ (hsp) he ||? 
=p" ((Si [lbs |? + 83|\he||?)hEhe + (82|[he||? + 63||hs||?)hehe 


+ 85(hfh.)heh? + 83(hf'h,)hh? ) p. (9.106) 


This is a quadratic form with respect to the precoding vector and with a 
Hermitian matrix in the middle. Hence, it is maximized when p is selected as 
the unit-length eigenvector associated with the largest eigenvalue. 

With the optimized precoding vector and surface configuration described 
above, the covariance matrix R will take a particular value that we will denote 
as R. Based on this matrix, we can derive the Neyman-Pearson detector that 
gives a desired false alarm probability Ppa = a following the approach from 
Section 8.3.2. In particular, y ~ Nc(0,07Ix) under the hypothesis Ho and 
y ~ Nc(0, PR + o7Ix) under the hypothesis H;. Lemma 2.14 says that we 
should decide on the hypothesis H if 


-y"(PR+0°Ix) y 


1 
— 3 e 
< fyna (y|H,) = ah oe Ix) aS = . (9.107) 
Fyltty (YHo) fae: a 


We can rewrite this condition by using the fact that In(y) is a monotonically 
increasing function for y > 0: 


= -1 
In(y) — In(b) < a *y"y — y" (PR + ok) y, (9.108) 


where the constant b = det(o?Ix)/det(PR + 071) is independent of the 
received signal y. Hence, the Neyman-Pearson detector decides on H if 


Iyl? - y” (ZR + 1x) y > o? (In(y) — In(0)), (9.109) 


1 


=y 


where 7’ is the revised threshold variable that must be selected so that 


Pr =a= . SyHto (y|Ho) Oy. (9.110) 
q' 


e 


7 a 
The sufficient statistics for target detection is ||y||? — y" (SR +I x) y and 


is affected by the precoding and surface configuration through R. 

Figure 9.24 shows the detection probability, Pp, versus the reference 
SNR obtained if the radar transceiver has a single antenna and there is no 
reconfigurable surface. The Neyman-Pearson detector is used with the false 
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Figure 9.24: The detection probability with respect to the reference SNR in a setup with or 
without a reconfigurable surface. The surface either has a random phase-shift configuration or is 
optimized to maximize the received power. 


alarm probability Ppa = aœ = 1073. The radar transceiver is equipped with a 
half-wavelength-spaced ULA with K = 10 transmit and receive antennas. The 
target location is in the direction y = 0 seen from the transceiver, while a 
reconfigurable surface with N = 100 elements is seen in the direction yp = 7/6. 
We let 8; = G2N* = BaN? so all the propagation paths are equally strong 
when the surface is optimally configured. The solid black curve shows the 
detection performance without the surface, in which case MRT is the optimal 
precoding. The dashed red curve is obtained when the reconfigurable surface 
is added to the setup, but it has a random configuration and the precoding 
still points the signal directly toward the target. The detection probability 
is improved, but the effect is negligible since the extra paths are weak. It 
is when the precoding and surface configuration are jointly optimized that 
we can observe large improvements. The dash-dotted blue curve represents 
this case and is shifted by roughly 4dB to the left, compared to the original 
black curve. This is explained by the fact that total received power Ptr(R) is 
increased by 3.9dB. The blue curve is steeper thanks to the spatial diversity 
gain obtained by having three random RCS coefficients instead of one. 
Beyond this basic example, there are many other MIMO radar system 
configurations and more complex propagation channels where a reconfigurable 
surface can enhance detection performance. The expected gains are created 
by ensuring that a larger fraction of the transmitted power reaches the target 
location and is then reflected toward the receivers, as well as by creating extra 
spatially distinguishable paths that provide diversity against randomness and 
improved spatial resolution. We refer to [175], [176] for further details. 
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9.5 Exercises 


Exercise 9.1. Consider the setup from Example 9.4 with equal propagation losses to 
all elements, such that the end-to-end channel gain is (V8: + NV prb)”. How many 
metaatoms WN are required for the reconfigurable surface to double the received power, 
compared to the case of N = 0? 


(a) Answer the question when 6, = 6; = br = 1078. 
(b) Answer the question when 6, = 10~'°, 6, = 107°, and br = 107°. 


Exercise 9.2. The end-to-end channel in (9.28) becomes h = — hyne?” hin if the 
static channel is totally blocked. Suppose the channels are equally strong to/from all 
metaatoms: hr,n = VBr and hin = Vbi, n = 1,..., N. 


(a) What is the average channel gain E{|h|*} if the phase-shifts Yn are selected as 
independent random variables that are uniformly distributed between 0 and 27? 


(b) What will |h|? become if the phase-shifts are selected to maximize it? 


(c) Compare the results in (a) and (b). What kind of gain is missing in (a)? 


Exercise 9.3. The end-to-end channel gain in (9.25) becomes |h7Dyht|? if the static 
channel is totally blocked by some objects (e.g., at high frequencies). Suppose h, ~ 
Nc(0, In) and h, ~ Nc(0, 6,In) and they are independent. 


(a) What is the average channel gain with a static surface with Dy = In? 


(b) What is the average channel gain with a reconfigurable surface that is configured 
to maximize the channel gain? Hint: E{|ht n|} = V bts /T/4. 


Exercise 9.4. The LOS end-to-end channel gain is stated in (9.35) when the static 
channel is negligible. Suppose the transmitter and receiver are equipped with isotropic 
antennas, while each metaatom has the effective area Am = (A/4)?. 


(a) Determine an expression of the end-to-end channel gain when the distance between 
the transmitter and the surface is dą and the distance between the surface and 
the receiver is dr. 


(b) How many metaatoms are needed to achieve an end-to-end channel gain of 107° 
if the wavelength is A = 0.1m (i.e., 3 GHz), dẹ = 50m, and dy = 2m? How large 
is the total area NAm of the surface? 


(c) How many metaatoms are needed to achieve the same channel gain as in (b) when 
à = 0.01 m (i.e., 30 GHz). How large is the total area N Am of the surface? 


Exercise 9.5. Suppose the reconfigurable surface can turn off specific metaatoms so they 
absorb all incident signal energy instead of reflecting anything. 


(a) Use this feature to estimate each of the cascaded channels hr nht, n sequentially 
while the remaining N — 1 metaatoms are turned off. Follow the ML estimation 
framework in Section 9.2.2 and assume that hs = 0. 


(b) Suppose all metaatoms are turned on during the ML estimation and simplify the 
ML estimator in (9.42) for the case when hs = 0. 


(c) Show that the ML estimate can be expressed as h = h + effective noise in both 
(a) and (b). Compare the variances of the noise terms. Is it preferable to turn 
metaatoms on/off during the channel estimation? Explain the result. 
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Exercise 9.6. Consider a reconfigurable surface designed as a uniform planar array with 
Ny columns and Ny metaatoms per column. Suppose a plane wave impinges from the 
direction y; in the azimuth plane and should be reflected towards a user in the direction 
o in the azimuth plane (i.e., 0i = 8o = 0). 


(a) Prove that the cascaded channel coefficient hr,nht,n is equal for all the Nv 
metaatoms located in the same column. 


(b) Based on the property proved in (a), if we deploy the reconfigurable surface 
in an environment where signals only propagate in the azimuth plane, we can 
reduce the number of phase-shift variables from Ny Ny to Ny. This is achieved 
by assigning the same phase-shift to metaatoms in the same column. Write up 
the corresponding end-to-end channel and factorize it similarly to (9.38). 


(c) Write up a new ML estimator that utilizes the new factorization from (b). Show 
that the minimum pilot length is now Lp = Ny +1. 


(d) Suppose the same total energy (N + 1)q is utilized for pilot transmission with 
the new ML estimator as with one considered in (9.44). How much smaller total 
variance will the scaled noise term have with the new estimator? 


Exercise 9.7. A classic way of extending wireless coverage (e.g., into tunnels) is to use a 
repeater that picks up the signal using one antenna and immediately retransmits an 
amplified version using another antenna. In this exercise, it will be compared with a 
reconfigurable surface in the same deployment scenario. 


(a) The received signal at the repeater is yı = S21 + nı and the received signal 
at the receiver is y2 = //B;r2 + n2, where n1,n2 ~ Nc(0, No). Suppose the data 
signal xı ~ Nc(0, qı) is transmitted and that the repeater sends x2 = yayı, where 
a is the amplification gain. How should a be selected to ensure that E{|a2|?} = q2? 


(b) What is the SNR at the receiver when using the repeater with the amplification 
gain obtained in (a)? 


(c) If a reconfigurable surface is used in the same scenario, the SNR would be 
qN? Bi Br /No. Derive an expression for how many metaatoms N are required to 
achieve a larger SNR with the surface than with the repeater. 


(d) Compute the number of metaatoms in (c) if 6, = 8, = 1078 and q/No = 10°. To 
make the total transmit power the same in both setups, we let qi = q2 = q/2. 


Exercise 9.8. The end-to-end channel gain in (9.31) with an optimal phase-shift config- 
uration is ||h||?, where the 1-norm is used. 


(a) A MISO channel with the same channel vector achieves the channel gain ||h|l?, 
where the Euclidean norm (2-norm) is used. Which of the two squared norms is 
the largest? Under which conditions are they equal? 


(b) The phase-shift vector acts as a beamforming vector with |||? = N +1. If 
MRT is used with a precoding vector that has the same squared norm, what will 
be the resulting channel gain? Is it larger than ||h||7? 


Exercise 9.9. Derive (9.61) from (9.55) step-by-step by utilizing the two properties 
stated after the equation. Hint: Use the commutative and associative properties of the 
convolution. The commutative property states that (f *g)(t) = (g* f)(t). The associative 
property of the convolution is (f * g * h)(t) = (f *g) * (h)(t) = (Ff) * (g * A). 


9.5. Exercises 655 


Exercise 9.10. Consider the reflection coefficient Fon when the metaatom’s impedance 
Zn is given by (9.56). 


(a) What does Ton converge to as f + 0? What is its amplitude and phase? 
(b) What does Fon converge to as f —> co? What is its amplitude and phase? 
(c) Show that |[on| = 1 for all f if R = 0. 


Exercise 9.11. If a known pilot signal is transmitted on all subcarriers, the ML estimator 
described in Section 9.2.2 can be applied to separately estimate each of the channel 


vectors h{0],...,h[S' — 1]. However, this is unnecessary because adjacent subcarriers 
have similar channel vectors. 


(a) Show that h[é] in (9.64) can be expressed as wT h[é], where 7 = [1,el”1,..., el]. 
(b) Relate hy|0],...,hy[T] to h{O],...,h[S — 1] using a DFT matrix. 


(c) Use the property in (b) to determine an ML estimator of h[0],...,h[T], based on 
the received signals from pilot transmission on T + 1 subcarriers. 


Exercise 9.12. Consider a SIMO channel aided by a reconfigurable surface where the 


channel vector is 
h = hs + H-Dy hs. (9.111) 


Suppose there is a free-space LOS channel H, = a,b; between the surface and receiver, 
where ar, br are vectors. Determine the surface configuration that maximizes the capacity. 


Exercise 9.13. Consider a point-to-point MIMO system with M = K where the channel 
matrix H € C“*™ has full rank. The system operates at high SNR so that equal power 
allocation is optimal. 


2 
(a) Prove that the high-SNR capacity is upper bounded by M loga (1 + Ey Under 


what conditions on H is the upper bound achieved? Hint: Use the inequality of 
arithmetic and geometric means from Lemma 3.2. 


(b) The channel contains a reconfigurable surface, so the channel matrix is modeled 
according to (9.81) as H = H; + H,-DyH4. How should the surface be configured 
to maximize the high-SNR capacity if Hs, H,, and Ht are rank-one matrices? 


Exercise 9.14. Propose an algorithm for downlink sum capacity maximization that 
switches between solving (6.122) for given user channels and updating the phase-shifts 
similarly to Algorithm 9.3. 
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Mathematical Notation 


Upper-case boldface letters are used to denote matrices (e.g., X, Y), while 
column vectors are denoted with lower-case boldface letters (e.g., x, y). Scalars 
are denoted by lower/upper-case italic letters (e.g., x,y, X,Y) and sets by 


calligraphic letters (e.g., ¥, V). 
The following general mathematical notations are used: 


R The space of real-valued numbers 

C The space of complex-valued numbers 

RN The space of real-valued N-dimensional vectors 

CN The space of complex-valued N-dimensional vectors 
oe” The set of complex-valued N x M matrices 

A= {ai,...,aw} A set with the members aj,...,an 

TEA x is a member of the set A 

TZA x is not a member of the set A 

AcB A is a subset of B 


{(Ri, Rə) : cond} 


[x]; 


The set of all (Ri, R2) that satisfy the condition 
The ith entry of a vector x 


[X]; The (i, j)th entry of a matrix X 
diag(d1,...,dy) Diagonal matrix with d,,...,dy on the diagonal 
Im The M x M identity matrix 

0 A matrix with only zeros with matching size 
X* The entry-wise complex conjugate of X 

XT The transpose of X 

X" The conjugate/Hermitian transpose of X 
x The inverse of a square matrix X 

xi? The square root of a square matrix X 

tr(X) Trace of a square matrix X 

det(X) Determinant of a square matrix X 

xOy Entry-wise (Hadamard) product of x, y 
X@Y Kronecker product of X and Y 

Ixl] The Euclidean norm ||x|| = vX}; |[x]i|? of x 
|X |p The Frobenius norm of X, defined in (5.87) 
R(x), S(x) Real part and imaginary part of x 


The imaginary unit /—1 
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|x| 
arg (x) 
[x] 
[z] 


min(z, y) 
max(z, y) 
mod § 

[| ia 

loga (x) 

ln(x) 

sin(x), cos(x) 
tan(x) 
arcsin(2) 


Rayleigh(c) 
Rice(v, o) 
Exp(z) 
X(N) 
Ufa, b] 
E{x} 
Var{x} 
Cov{x} 
Pr{cond} 
H(y) 
H(y|a) 
L(x; y) 
Ho, Hı 


The square root of x 

The nth root «!/" of x >0 

Magnitude (or absolute value) of a scalar x 
The phase in [—7, 7) of complex number x 
Closest integer smaller or equal to x 

Closest integer greater or equal to x 

The factorial function for positive integers n 
Euler’s number (e ~ 2.71828) 

The minimum of x and y 

The maximum of x and y 

Modulo operation (the remainder after division by S) 
Wraps x within the range (—1,1), see (5.194) 
The logarithm of x using the base a > 0 
The natural logarithm of x (base e) 

The sine and cosine functions of x 

The tangent function of x 

The inverse sine function 

The inverse tangent function 

The complex exponential function of x 

The sinc function sinc(x) = sin(a) /(72) 
The Dirac delta function 


Cyclic convolution of the discrete sequences f [k], g[k] 
Fourier transform of the continuous function a(t 
DFT of the discrete sequence x[s] 

Means “distributed as” 

Gaussian distribution with mean p and variance g? 
Circularly symmetric complex Gaussian distribution 
with zero mean and covariance matrix R 

Rayleigh distribution with the scale parameter o > 0 
Rician distribution with the parameters v,o > 0 
Exponential distribution with the rate x > 0 
Chi-squared distribution with N degrees of freedom 
Uniform distribution between a and b 

The mean of a random variable x 

The variance of a random variable x 

The covariance matrix of a random vector x 

The probability that the condition “cond” is satisfied 
The differential entropy of y, see (2.134) 

The conditional differential entropy, see (2.135) 

The mutual information between x and y 

The null hypothesis and alternative hypothesis 
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Specific Notation 


Many variables are used in the different chapters and some names are used 
for multiple purposes. The following is a non-exhaustive list of such notation: 


Qi 
a(p),am(ẹ) 
am (9, 0) 
aMy,My (9, 0) 


D, Dy 


vpr 


AS moots 

ee D p> 
2 Mh T 
3 


Dya 
> 
3 


Attenuation of path i 

Array response vector of a ULA in 2D 

Array response vector of a ULA in 3D 

Array response vector of a UPA 

Area of an isotropic antenna [m?] 

Area of a metaatom [m°] 

Effective area function of an antenna [m?] 

The signal bandwidth [Hz] and symbol rate [symbol/s] 
The channel gain 

The speed of light in free space (vacuum) [m/s] 

The capacity of a channel [bit/s] or [bit /symbol] 

The e-outage capacity of a channel [bit/s] or [bit /symbol] 
The single-user capacity [bit/s] 

Capacity function in (6.7) [bit/s] 

The propagation distance (for paths or to antennas) 
The aperture length and normalized length D) = D/A 
A diagonal matrix, often from the SVD 

Reflection matrix in (9.17) of a reconfigurable surface 
The antenna spacing and normalized spacing A, = A/X 
The sampling delay at the receiver [s] 

A frequency variable [Hz] 

Carrier frequency of the signal [Hz] 

The PDF of a random variable x 

The CDF of a random variable x 

The S x S DFT matrix defined in (2.198) 

Antenna gain function 

A scalar channel coefficient 

A SIMO/MISO channel vector 

A MIMO channel matrix 

Estimates of a channel scalar/vector/matrix 
Estimation errors of a channel scalar/vector/matrix 
Beamspace representation of the channel matrix 
Horizontal/vertical indices in a UPA, see (4.124)—(4.125) 
Number of transmit or receive antennas 

XPD-related variable, see (4.171) 

Wavelength [m] or an eigenvalue of a matrix 

Number of variables in different contexts 

Vertical and horizontal length of a UPA 

Length of a coherence block and pilot length [symbols] 
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My, My 
MSE; 

N 

No 

Na, Npath 
Nrr 


p(t) 


p 

P, Psp, Pre 
P, P; 

P, 

pui Pal 

Pp 

Pra 

Pu 

Pregl 0) 
Pcapon (9, 0) 
Pout (R) 

P, Pt, Pr, Pis, Po 
Pbeam; Obeam 


Ti 

0, 0s, Or, 0i, bo 

T 

To 

U 

U,V 

WwW 

W, Wess, WRF 


Number of antennas per column and row in a UPA 
The MSE of an estimate of x 

Number of metaatoms in a reconfigurable surface 
Noise power spectral density [W/Hz] 

Number of clusters/paths in a multipath channel 
Number of RF inputs/outputs in hybrid beamforming 
The pulse used in PAM, often p(t) = VBsinc(Bt) 
A transmit precoding vector 

Arbitrary digital and hybrid precoding matrices 
Transmitted signal power [W] 

Received signal power [W] 

Transmit power in uplink/downlink 

The correct detection probability 

The false alarm probability 

The missing probability 

Power spectrum with conventional beamforming 
Power spectrum with Capon beamforming 

The outage probability given the rate R 
Azimuth angle 

The beam direction 

Phase-shift of metaatom n 

Achievable rates [bit/s] or [bit /symbol] 

The covariance matrix of a random vector h 
Rate region and its Pareto boundary 

The symbol power q = P/B |Joule] 

The symbol power assigned to the kth channel 
Diagonal matrix with q1,...,q¢K 

The rank of a matrix 

Number of subcarriers in OFDM 

Diagonal matrix with singular values 

The kth singular value in the SVD of a matrix 
Variance of the noise [W] or of another variable 
The RCS of a target object 

SNR variable 

A time variable [s] 

Propagation delay of path i [s] 

Elevation angle 

The memory of an FIR filter [symbols] 

Channel coherence time [s] 

Speed of movement [m/s] 

Unitary matrices, often obtained from the SVD 
A receive combining vector 

Arbitrary digital and hybrid combining matrices 
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Abbreviations 


The following acronyms and abbreviations are used in this book: 


2D 

3D 

3GPP 
5G,4G,3G,2G 
ADC 


CDMA2000 
CSI 
DAC 
dBi 
dBm 
DFT 

dl 

DOA 
DPC 
DSFT 
DTFT 
eCDF 
EIRP 
ELAA 
ESPRIT 


EV-DO 
FDMA 
FIR 
GSM 
Lid. 
IDFT 
IEEE 
IRS 
IS-95 
ISAC 
ITU 
LDPC 
LMMSE 


two-dimensional 

three-dimensional 

3rd generation partnership project (an organization) 
fifth/fourth/third/second generation 
analog-to-digital converter 

active electronically scanned array 
additive white Gaussian noise 

baseband 

baseband unit 

cumulative distribution function 
code-division multiple access 

name of a CDMA-based 3G standard 
channel state information 
digital-to-analog converter 

decibels referenced to an isotropic antenna 
decibels referenced to 1 mW 

discrete Fourier transform 

downlink 

direction-of-arrival 

dirty paper coding 

discrete-space Fourier transform 
discrete-time Fourier transform 

empirical cumulative distribution function 
effective isotropic radiated power 
extremely large aperture array 

estimation of signal parameters by 
rotational invariance techniques 
Evolution-data optimized (4G standard) 
frequency-division multiple access 

finite impulse response 

Global system for mobile communications (2G standard) 
independent and identically distributed 
inverse DFT 

Institute of electrical and electronics engineers 
intelligent reflecting surfaces 

Interim standard 95 (2G standard) 
integrated sensing and communication 
International telecommunication union 
low-density parity-check 

linear MMSE 
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LNA 
LOS 
LTE 
LTI 
MCS 
MIMO 
MISO 
ML 
mmf 
MMSE 
mmWave 
MRC 
MRT 
MSE 
MUSIC 
MVDR 


low-noise amplifier 

line-of-sight 

Long-term evolution (4G standard) 
linear time-invariant 

modulation and coding scheme 
multiple-input multiple-output 
multiple-input single-output 
maximum likelihood 

max-min fairness 

minimum mean-squared error 
millimeter-wave 

maximum-ratio combining 
maximum-ratio transmission 
mean-squared error 

multiple signal classification 
minimum-variance distortionless response 
Near-field communication (a wireless standard) 
non-LOS 

normalized MSE 

non-orthogonal multiple access 
New radio (5G standard) 

orbital angular momentum 
orthogonal frequency-division multiplexing 
orthogonal multiple access 

optimal 

power amplifier 

pulse-amplitude modulation 
probability density function 

perfect electric conductor 

passive electronically scanned array 
phase shifter 

primary synchronization signal 
quadrature amplitude modulation 
radar cross section 

radio-frequency 

reconfigurable intelligent surfaces 
root MSE 

regularized zero-forcing 
space-division multiple access 
successive interference cancellation 
single-input multiple-output 
signal-to-interference-plus-noise ratio 
single-input single-output 


662 Appendix 


SLNR signal-to-leakage-and-noise ratio 
SNR signal-to-noise ratio 

sr sum rate 

STBC space-time block code 

su single user 

SVD singular-value decomposition 
TDMA time-division multiple access 
TDOA time-difference-of-arrival 

TOA time-of-arrival 

TTD true time delay 

TWF transmit Wiener filter 

ul uplink 

ULA uniform linear array 

UMi urban microcell 

UMTS Universal mobile telecommunications system (3G standard) 
UPA uniform planar array 

WiFi Trademark used for WLAN 
WLAN wireless local area network 
XPD cross-polar discrimination 


ZF zero-forcing 
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